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TOTAL VARIATION AND SEPARATION CUTOFFS ARE NOT 
EQUIVALENT AND NEITHER ONE IMPLIES THE OTHER 


JONATHAN HERMON, HUBERT LACOIN, AND YUVAL PERES 

Abstract. The cutoff phenomenon describes the case when an abrupt transition oc¬ 
curs in the convergence of a Markov chain to its equilibrium measure. There are various 
metrics which can be used to measure the distance to equilibrium, each of which corre¬ 
sponding to a different notion of cutoff. The most commonly used are the total-variation 
and the separation distances. In this note we prove that the cutoff for these two dis¬ 
tances are not equivalent by constructing several counterexamples which display cutoff 
in total-variation but not in separation and with the opposite behavior, including lazy 
simple random walk on a sequence of uniformly bounded degree expander graphs. These 
examples give a negative answer to a question of Ding, Lubetzky and Peres. 
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1. Introduction 

Consider an irreducible discrete-time Markov chains X = {Xt)t> o, defined on a finite 
state space Cl (we call a chain finite if Cl is finite). We let P denote its transition matrix. 
We further assume that X is reversible, that is that there exists a probability measure 7r 
which satisfies the detailed balanced equation 

Vx, y € Cl, 7r(x)P(x, y) = ir(y)P(y, x). 

This measure is unique because of irreducibility. Let us assume furthermore that our 
Markov chain is lazy, meaning that 

Vx € O, P(x,x)>l/2. (1.1) 

A particular important case of such a Markov chain is lazy simple random walk (SRW) 
on a simple graph G = (V,E), in which case Cl = V, P(x,y) = 1{x = y} + and 

7r(x) = , where deg(x) := \{y : {x,y} S E}\ and | • | denotes the cardinality of a set. 

It is a classic result of probability theory that for any initial condition the distribution 
of V(t) converges to 7r when t tends to infinity. The object of the theory of Mixing 
times of Markov chains is to study the characteristic of this convergence (see m for a 
self-contained introduction to the subject). 

We denote by P x (P x ) the distribution of X t (resp. (X t )t> o), given that Vo = x. For 
any two distributions ft, v on Cl, their total-variation distance is defined to be 

\\p — u||tv := ^ ^2 Im( x ) — u(x)| = X d{x) ~ v(x) = l-^2mm(p,(x),v(x)). (1.2) 

x^Q {# : fi(x)>i'(x)} x EQ 

The worst-case total-variation distance at time t is defined as 

d(t) := m&xd x (t), where d x {t) := ||P x (V i € •) — 7t||tv- 

x eu 
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The (total-variation) e-mixing-time is defined as 

tmix(e) := inf {t : d(t) ^ e} . 

Similarly, the (worst-case) separation distance from stationarity at time t is defined as 

d sep (t) := 1 — min P t (x,y)/ir(y), 

x,y£tt 

and the e-separation-time (the “e separation-mixing-time”) is defined as 

fsep(^) ■— inf {t . d sep (t) sj s} . 

When e = 1/4 we omit it from the above notation. 

Next, consider a sequence of chains, P n ,Tr n ) : n € N), each with its corresponding 
worst-distances from stationarity d^ n \t), <4ep(i), its mixing and separation times t ffi Y , 

isep, etc.. Loosely speaking, the total-variation (resp. separation) cutoff phenomenon 
is said to occur when over a negligible period of time, known as the cutoff window, the 
worst-case total variation distance (resp. separation distance) drops abruptly from a value 
close to 1 to near 0. In other words, one should run the n-th chain until time (1 — o(l))t ffi Y 
(resp. (1 —o(l))£sep) for it to even slightly mix in total variation (resp. separation), whereas 

running it any further after time (l+o(l))t ffi Y (resp. (l+o(l))tsep) is essentially redundant. 
Formally, we say that the sequence exhibits a total-variation cutoff (resp. separation 
cutoff) if the following sharp transition in its convergence to stationarity occurs: 

Ve € (0,1/2], lim £ x (e)/£ } x ( 1 - e) = 1 ( resp. lim tS(e)/t£$(l - e) = l) • (1-4) 

n—too \ n—t oo / 


It is a classical result (e.g. m Lemmas 6.13 and 19.3] or (16.91) ) that under reversibility 
the separation and total-variation distances and mixing times can be compared as follows 
(the second line being an easy consequence of the first) 

Vi > 0, d(t) < dsep(t) < 1 — (1 — min(2d(i/2), l)) 2 < 4d(i/2), 

Va € (0,1), imix(a) < t sep (a) < 2i mix (a/4). 


Another important family of distances is the family of £ p distances (1 < p < oo): 

, = | E x™( x ) a %A x )] 1,P i l<P<oo, 

|max iefi a^ v ^{x), p = oo, 


\\h-v\ 


p,1T 


where a^ v ^{x) := \p{x) — v(x)\/i r(x) (observe that \\p — u||i )7r = 2||/x — u||xv)- Note that 
the notion of distance to equilibrium and mixing time can be transposed to these distances 
by replacing || • ||tv by II ■ \\p,n in (jl-3p . For a € (0, oo) we denote the a-th ^-mixing time by 
te p (a). Under reversibility, the distances can be compared as follows (see (3 Proposition 
5.1]) 


ti 2 ( a ) < ti p (a) < 2 te 2 (ffa) for p e (2, oo], 

—ti 2 (a mp ) < t £p (a) < t i2 (a) for p € (1,2), 

TTlp 

where m p := \p/{2(jp — 1))]. Hence in some sense, up to a multiplicative constant, the 
different £ p mixing times (p G (l,oo]) are equivalent. It turns out that under reversibility 
the notion of cutoff for these distances are also equivalent. 
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Theorem A (Chen and Saloff-Coste [7]). Let (Q n , P n ,ir n ) be a sequence of reversible 

(n) 

lazy Markov chains. Let A' be the second largest eigenvalue of P n . Then the following 
assertions are equivalent 

• The sequence exhibits l p -cutoff for some 1 < p < oo. 

• The sequence exhibits l p -cutoff for all 1 < p < oo. 

• lim n _ ) . 0O (l - \^)t ( £\l/2) = oo. 

Observe that under reversibility (for any fixed chain) (11.51) expresses an equivalence 
between the separation and the total-variation mixing times, parallel to the one, expressed 
in (11.61) . holding between the different l v mixing times for p € (l,oo]. Hence a natural 
question (in light of Theorem O is whether (under reversibility) there is cutoff in total- 
variation if and only if there is cutoff in separation. This is Question 5.1 in [TO], where 
an affirmative answer was given for the class of birth and death chains (which are Markov 
chains for which the set of edges (x,y) with P(x,y ) > 0 forms a segment). In fact, both 
cutoffs were shown to be equivalent to the product condition (13.21) . 

Theorem B (Ding, Lubetzky and Peres [TO] , Diaconis and Saloff-Coste ED- A sequence 
of birth and death chains exhibits total variation cutoff iff it exhibits separation cutoff. 

In this note we give a negative answer to that question in general by constructing 
counter-examples. 

Theorem 1.1. (i) Total-variation and separation cutoff are not equivalent for lazy 

reversible Markov chains and neither one implies the other. 

(ii) The above statement remains true within the class of lazy simple random walks on 
graphs of maximal degree at most 7. 

Remark 1.2. We can also produce non-reversible or non-lazy counter-examples by per¬ 
forming artificial modifications in our chains, but this is not a very important point. Non- 
lazy or non-reversible chains can have very pathological behavior and we want to underline 
that we are not using “unfair tricks” to produce our counter-examples. 

Of course a full proof of this statement only requires two counter-examples as (ii) is 
a stronger statement than (i). However, we have chosen to include also examples that 
are not simple random-walks because they are much simpler. We present a total of five 
counter-examples. Apart from the first one, they are all lazy (weighted nearest-neighbor) 
random walks on bounded degree graphs, with transition rates which are bounded away 
from zero. The last two example, which are a bit more technical to analyze, are lazy SRWs 
on a sequence of bounded degree graphs G n := ( V n ,E n ) (i.e. sup n max„ e y n deg(w) < oo). 

Note that for all our counter-examples the graph supporting the transitions contains 
some cycles. An interesting open problem is to determine whether Theorem [B] can ex¬ 
tended to the case of lazy weighted nearest-neighbor random walk on trees for which it is 
already known (cf. [5]) that separation cutoff implies total-variation cutoff. 

A sequence of Markov chains is said to display pre-cutoff (in total-variation resp. 
separation) if 

sup limsup f (e) / t^ x ( 1 - e) < oo resp. sup limsup^epOOAsept 1 - e) < oo. 
0<e<l/2 n->o o 0<£<l/2 n^roo 

We call the value of the sup above the pre-cutoff ratio. Equation () 1.5 1) implies that 

sup limsuptg(e)/tg(l — e) < 2 sup lim sup t^(e)/t^(l - e). (1.7) 

eS(0,l/2] n^roo ee(0,l/2] n-t-oo 
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A symmetrized version of this inequality also holds provided that t ffi Y goes to infinity (the 
assumption being present just to avoid pathological behavior) 

sup Iimsupi2 c (e)/t^(l — e) < 2 sup limsupig(e)/ig(l - e). (1.8) 

£6(0,1/2] n->oo £6(0,1/2] ri-K» 

The proof of (|1.8j) involves more computation than (|1.7[i . We present a complete proof of 
it in Appendix IA.2D 

These two inequalities imply that the notion of pre-cutoff is equivalent for the two 
distances and the pre-cutoff ratio of one is at most twice that of the other. In particular, 
cutoff in one distance implies pre-cutoff with ratio at most 2 in the other. With our 
examples, we shall show that this is in fact sharp in some cases: 

Remark 1.3. There exists a sequence of lazy reversible Markov chains for which we have 
cutoff in total-variation and only pre-cutoff with ratio 2 in separation and vice-versa. 


Our last point of comparison between total-variation mixing and separation mixing is 
related to the width of the cutoff window. We say that a sequence of chains exhibits 
total-variation (resp. separation) cutoff with a cutoff window w n if w n = o(t^j x ) and for 
all 0 < e < 1/4 there exists some constant C e > 0 (depending only on e) such that 


Vn > *2c(e) - - <0 < c ew n (resp. ig(e) 


*sep(! -e) < C £ w n ). 


Note that the window defined in this manner is not unique, but informally “the” cutoff 
window is given by the “smallest such w n n . Our examples demonstrate that the cutoff 
windows for total-variation and separation do not have the same behavior. 

The following result is due to Chen and Saloff-Coste [HJ Theorem 3.4]. We present a 
much simpler proof in the Appendix. 


Theorem C. Let (12 n , P n , ir n ) be a sequence of lazy irreducible finite chains which exhibits 

total-variation cutoff with a cutoff window w n . Then w n = fij^ x ). 

The bound given by Theorem O is obviously sharp for the biased random walk on a 
segment. Conversely, some very standard Markov chains like the lazy SRW on the n- 

dimensional hyper-cube have a cutoff window w n » \j t ffi Y (here w n = n and t ffi Y = 
(/ ± o(l))n log n). As indicated in Remark 11.61 the laziness assumption in Theorem ICl can 
be replaced by the assumption that inf n min x(S Q n P%(x, x) > 0 (as is the case for simple 
random walk on a sequence of bounded degree graphs). 


In light of Theorem [C] one might expect that whenever separation cutoff occurs for 
a sequence of discrete-time lazy chains, the width of the separation cutoff window is 

Ll(\Jtiep). We are unaware of any previously analyzed example in which this fails. We 
find it remarkable that as the following remark asserts, the width of the separation cutoff 
window for a sequence of discrete-time lazy SRWs on a sequence of bounded degree graphs, 
can in fact be a constant! This, or more precisely, the mechanism that allows such behavior 
(see § 12.41 for more on this point) demonstrates that the separation distance can exhibit 
profoundly different behaviors than the total variation distance. 

Our counter-examples show that the cutoff window in one distance can be as small as 
allowed even if there is no cutoff for the other distance: 


Remark 1.4. We will construct sequences of bounded degree graphs such that the corre¬ 
sponding sequences of lazy SRWs exhibit the following behaviors (resp.) 
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(i) There is no separation cutoff but there is total-variation cutoff with window y . 

(ii) There is no total-variation cutoff but there is separation cutoff with window 1. 


In § \2-4\ we refine the statement of (ii) and describe further surprising properties of the 
relevant example for (ii) above (listed in § \2.f \ as properties (i)-(v)). 


Remark 1.5. Let 5 n € (0,1). We call a sequence of discrete time chains (Ll n ,P n , n n ), 
5 n -lazy if for all n, P n (x,x) > 5 n for all x € U n . It is not hard to extend the proof of 
Theorem and show that if a sequence of 5 n -lazy chains exhibits total-variation cutoff 

with a window w n , then w n = Ll{\J5 n { 1 — 5n)i^ x )- 

Theorem m can also be extended to the continuous time setup, with the additional as¬ 
sumption that the sum of the transition rates from any given state is bounded above by 1 
(or by some absolute constant). 


Remark 1.6. Let G n = (V n ,E n ) be a sequence of connected non-bipartite simple graphs of 
maximal degree d n . Consider the sequence of (non-lazy) SRWs on G n . Then P%(v,v) > 
1 / d n , for every v € V n . By considering P 2 rather than P it follows from the previous 
remark that if the sequence exhibits total-variation cutoff with a window w n , then w n = 

Ll{yjt^ix/du)- This is in fact sharp by considering a sequence of random d n -regular graphs 
of size n for some d n such that limn^oo d n = oo and d n = o( - ) [171 Theorem 3]. 

1.1. Organization of the note. In §[2]we describe the construction of our examples and 
our general strategy. We also describe relevant examples due to Aldous and Pak. 

In § [3] we introduce a general framework, which under a certain condition, allows to 
reduce the study of the mixing-time to the study of the hitting time of a special point. 

In § [4] we describe two examples of sequences of Markov chains which exhibit total- 
variation cutoff but do not exhibit separation cutoff. The first example, Example [TJ 
demonstrates that (II.7[ may be sharp (even when the r.h.s. of (jl.7l) equals 1). The second 
example, Example [2] is a weighted nearest neighbor random walk on a bounded degree 
graph with transition probabilities which are bounded away from 0 and 1. 

In § [5] we construct an example of a sequence of Markov chains that exhibits separation 
cutoff but no total-variation cutoff (Example (H). 

Finally, in §[6]we transform Examples [2] and [3] into examples of sequences of lazy SRWs 
on bounded degree Expander graphs. The reason we first describe Examples [2] and [3] is 
that the key ideas of our constructions are more transparent in theses examples. 


2. An overview of the main ideas of our CONSTRUCTIONS 

2.1. A very basic chain with different cutoff times for separation and total 
variation. In this section we settle with a high-level description of some key ideas. Let 
us first present a very simple Markov chain which exhibits cutoff in both distances (see 
Figure [1]) but for which the mixing-time in separation is twice as large as that in total 
variation. 

Consider a random walk on a segment a, b of length 2 n which presents a constant bias 
towards the middle point which we call z (see Figure [Tj) . Most of the equilibrium measure 
is concentrated on a small neighborhood of z and for this reason (cf. Proposition 13.31) 
the total-variation mixing-time corresponds to the time which is needed to hit 2 (starting 
from either of the end-points). The system displays cutoff because this hitting time is 
concentrated around its mean. 
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1/2 1/6 1/3 1/4 1/4 1/3 1/6 1/2 



Figure 1. A very simple chain for which the separation mixing-time is twice as large as 
the total-variation mixing-time (6 n and 12n, respectively). The transition rates (apart 
from at the special states a, b and z) are 1/3 in the z direction and 1/6 in the opposite 
one (the holding probability is 1/2), making the chain travel at speed 1/6 towards z. 

The separation mixing-time on the other hand is twice as large. Roughly speaking, this 
is because for P t (a,b) to come close to its equilibrium value, “information” has to pass 
from one end to the other. The time required for this to occur corresponds more or less 
to the sum of the times needed to reach z from a and b, respectively (see Proposition 13. 8p . 

This scheme with two extremal opposite initial conditions, though not ubiquitous among 
Markov chains, appears in many natural examples for which cutoff has been proved: e.g. 
the lazy SRW on the hyper-cube (see m Theorem 18.3]), the Ising model at high tem¬ 
perature m or the adjacent-transposition shuffle on the segment m- 

2.2. An idea to avoid cutoff in separation while keeping that in total-variation. 

Our idea to produce counter-examples with total-variation cutoff but only pre-cutoff in 
separation is to modify the structure (state space and transition rates) of the simple chain 
above (Figure [[J, only on one side (say, the side of 6), to break the symmetry. To be 
precise, in Example [2] we first set the holding probabilities on both sides to be 3/4 (and 
consider the obtained chain as the “original chain”, as opposed to Example [H for which 
the chain in Figure D] serves as the “original chain”) before modifying the 6-side. We want 
to perform our modifications in the following manner: 

• We want to keep the property that every path from a to 6 goes through z, which 
shall still bear a positive proportion of the equilibrium mass. 

• We want a to remain the initial condition from which it takes the longest time to 
reach equilibrium (equivalently, to hit z). More precisely, we want that also after 
the modification, the distribution of the hitting time of z, T z := inf{f : Xf = z }, 
starting from a would still stochastically dominate the distribution of T z , starting 
from any other initial state. Moreover, we want the hitting time distribution of z, 
starting from any state between a and z (including a), to remain un-changed. 

• We want the hitting time of z from initial state 6, to become non-concentrated, 
and to remain of the same order of magnitude as the mixing-time of the whole 
chain. Moreover, we want this hitting time to remain (stochastically) larger than 
the hitting time of z, starting from any other state which lies between 6 and z, 
and to become stochastically dominated by the hitting time distribution of z (from 
6) in the original chain (which equals the hitting time distribution from a in the 
modified chain). 

In this manner, the hitting time distribution of z under P a remains un-changed (and in 
particular, remains concentrated). Moreover, after the modification it is still the case that 
d(t) ~ P a [T z > t], and thus by the aforementioned concentration there is still cutoff in 
total-variation (see Proposition ^. 31) . Using Proposition 1 3.8 1 we deduce that d SPp (f mw + t) S3 
P b[T z > f] and so there is no cutoff in separation as the hitting time distribution of z under 
Pf, in the modified chain is no longer concentrated. 
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To perform such a modification, we borrow ideas from previous constructions of Pak 
(for Example [TJ) and Aldous (for Example [2j), which we present now. 

2.3. Related Constructions. When the product condition (Definition 13.111 was shown 
to be a necessary condition for cutoff, it was conjectured that it should also be a sufficient 
one for “nice” chains. However, two counter-examples constructed, respectively by Aldous 
and Pak (see [5j Example 8.1], [7] and nm Chapter 18] for a more detailed description 
and analysis), show that in general the product condition does not imply cutoff. The 
mechanisms used to prevent cutoff in those two constructions are of different nature. 

• Aldous’ example (Figure [2]) locally looks like a biased random walk on a segment, 
so that most of the equilibrium measure is concentred on a small neighborhood of 
the end-point towards which the walk is biased (we call this end of the segment 
z and the opposite one 6). To avoid cutoff, the half of the segment closer to z is 
split into two distinct parallel branches. The transition rates on these branches are 
tuned so that there is still a bias towards 2 but such that one path is slower than 
the other. Starting furthest away from equilibrium (i.e. at state b ) we have two 
possible scenarios to reach z given by the two distinct branches and the probability 
of each is bounded away from 0 and 1. As the speed along the two branches is 
different, the CDF of the hitting time distribution of z starting from b has two 
abrupt jumps. Consequently, S- n \t) exhibits two distinct abrupt drops and there 
is no cutoff. 

• Pak’s idea is to start with a sequence of chains which exhibits cutoff and to modify 
it by adding transitions which are such that with a constant rate (which is chosen 
to be somewhere between the spectral gap and the inverse of the mixing-time of 
the original chain, say their geometric mean) the system is brought to equilibrium 
at once. For the modified Markov chain, the total-variation distance decays (up to 
a negligible error) exponentially with the rate of the newly added transitions and 
hence cutoff does not occur, neither pre-cutoff. 


1/6 1/12 



Figure 2. A version of Aldous’ example. The walk is always biased towards z but the 
speed of the walk depends on the branch. On the top branch, as well as on the rest of 
the segment, the transition rates are 1/6 in the z direction and 1/12 in the opposite one 
(the holding probability is 3/4) whereas on the bottom branch the (exit) rates are twice 
as large (and the holding probability is 1/2), resulting in a larger speed. As a result, 
two transitions occur for the total-variation distance at times 9 n and 12n respectively, 
where n denotes the total distance from z to b and the length of each of the two parallel 
branches is \n/2\ (above n = 14). The rates at b, z and at the branching point are not 
very relevant but we display them for the sake of concreteness. 


In our Example [T] (see Figure ED, we adapt Pak’s idea: on the 6-side (of the chain from 
Figure [TJ we add transitions from states on the 6-side to the center of mass z, and we 
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choose the inverse of the rate to be of the same order as the mixing-time (which is of 
order of the length of the segment: n). This makes the hitting time of z started from 6 
non-concentrated and (stochastically) smaller than started from a. Moreover, after this 
modification, all of the properties described in the beginning of § 12.21 are satisfied. 

In our Example El (see Figure 0]), we simply replace the 6-side by Aldous’ construction, 
and set the holding probability on the a-side to be 3/4 (which is the holding probability 
of the slow branch of the 6-side). After this modification, all of the properties described 
in the beginning of § 12.21 are satisfied. 

2.4. An idea to keep cutoff in separation while avoiding that in total-variation. 

For this part we must rely on a different idea. What we want to alter in our chain is the 
way the separation distance shrinks to zero. Loosely speaking, in the original chain on 
the segment, the separation mixing-time is determined by the sum of the hitting times of 
z from a and 6 since z is the only channel of communication between the two extremities. 

Our construction (Example [3]) relies on the following idea (see Figure [5]). We take the 
length of the line segment to be 2 (M + l)n for some large (fixed) integer M. 

• We connect the two sides of the segment at a second point z' which is far from the 
center of mass 2 . We do so by merging the two states which are of distance n from 
z (one on the a-side and one on the 6-side) into a single state z'. This connection 
maintains the cutoff in separation. However, it has the effect of shortening the 
separation cutoff time by some constant factor, while, as we now describe, drasti¬ 
cally altering the nature of the abrupt transition of diel (t) around the (separation) 
cutoff time. It follows from our analysis of Example [3] and the refined analysis of 
Example [5] in § 16.51 that provided that M is taken to be sufficiently large: 

(i) Also after creating the connection at z' we have that 

lim sup |<4e2(i) — max(0,1 — P*(a, 6)/7r n (6))| = 0. 

n—>-oo ^ p 

(ii) Due to the connection of A and B at point z', up to negligible terms, around 
the separation cutoff time, P^(a,b) is supported by trajectories which never 
get much closer to 2 than z! is, and so are contained in a set whose stationary 
probability is exponentially small in n. 

(iii) Let T“,’ b fDefinition 13.4D be a random variable distributed as a convolution of 
the hitting time distribution of z' started from a with that started from 6 (in 
this case the two distributions are identical). Around the (separation) cutoff 
time, P*(a, b)/7r n (b) can be understood in terms of the behavior of T“,’ 6 in the 
large deviation regime (namely, the cutoff occurs around the time t for which 
P[T“’ b > t] « 7 r n (z') = 0(2-)). 

(iv) Around tic p, P^(a, 6)/7r n (6) grows exponentially in t — tie p, for t > 4ep (and 
decays exponentially for t < is"p) and continues to do so for 0(n) steps around 
6sep (in particular, shortly after (a, 6) no longer minimizes P/(x', y)/TT n (y))- 
By (i), it follows that w n = 1 is a (separation) cutoff window (and we can 
take C £ = C\ logs), for some absolute constant C, for all e € (0,1/4]). 

(v) supj P*(a, 6)/7r n (6) = ©(max* P[T“/ b = t]/ir n (z')) = 0(2 n /n) —> oo as n —>• oo. 

This behavior (namely, on the one hand having property (i) and on the other 

having properties (ii), (iv) and (u)) is atypical and quite surprising at first sight. 
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We are not done yet, as after creating the connection at z\ there are two sym¬ 
metric parallel distinct branches from z' to the center of mass z, resulting in the 
hitting time of 2 from either a or b being concentrated. Consequently, there is still 
cutoff in total-variation (as by Proposition 13.31 d(t) « P a [T z > t] = P b[T z > t]). 

• We break the symmetry (between the two branches, but not between a and b ) in 
order to “destroy” the cutoff in total-variation by making the speed along the two 
paths which link z' to 2 different as in Aldous’ example (Figure [21). Observe that 
as opposed to Examples [Tll2l here a and b play symmetric roles (the chain looks 
the same starting from either one of them). 

As one should expect from property (ii) above (provided that M is sufficiently large), 
breaking the symmetry as described above does not influence the asymptotic pattern 
of convergence in separation, and (i)-(v) above remain valid. However the quantitative 
analysis of this example turns out to be more intricate than that of the first two. 

2.5. Constructing counter-examples which are lazy SRW on bounded degree 
graphs. It was observed by Peres and Wilson that the sequence of chains in Aldous’ exam¬ 
ple could be modified into a sequence of lazy SRWs on bounded degree expander graphs 
(see Definition 13.61) . In [T8] Lubetzky and Sly constructed explicit 3-regular expanders 
with total-variation cutoff. 

We use similar ideas to transform our Examples [2][3] into SRWs on bounded degree 
graphs (Examples [4][5]). Our constructions includes one new idea: by introducing a suf¬ 
ficient amount of symmetry, (roughly speaking) we are able to reduce the analysis of 
Examples 01[5] to that of Examples [2][3l Consequently, the analysis of the asymptotic con¬ 
vergence profile of d n (t) is simpler than in [18] (at the cost of having maximal degree < 7 
rather than 3). 


3. Preliminaries 


The aim of this section is to introduce some general theory which shall reduce the 
analysis of our Examples [1][3] to the analysis of hitting time distributions of a specific state. 
The results appearing in this section are later generalized in § 16.11 (these generalizations 
reduce the analysis of Examples 0][5] to the analysis of hitting time distributions of a specific 
set). All proofs are deferred to the appendix. As we shall only prove the more general 
versions, we now describe the correspondence between the results of this section to the 
ones from ? 16.11 Proposition 16.41 corresponds to Proposition 13.31 Lemma [6.31 to Lemma f3.5l 
and Proposition 16.51 to Proposition 13.81 


Let us first introduce some notation and standard terminology. Recall that if (D, P, i r) is 
a finite irreducible reversible Markov chain, then P is self-adjoint w.r.t. the inner product 
induced by -k on 

{f,g)ir'-=^2ir(x)g{x)f(x). (3.1) 

Hence it has |fi| real eigenvalues satisfying 1 = Ai > A 2 > ... > A|^| > —1 (where A 2 < 1 
since the chain is irreducible and if the chain is lazy then A|pi ^ 0). Define its relaxation¬ 
time as t Te 1 := (1 — max(A 2 , |A|q||)) - 1 . Note that under laziness t Te \ = (1 — A 2 ) -1 . 

Definition 3.1. We say that a family of reversible Markov chains satisfies the product 
condition if 


lim^l - max(A^ n \ |A|^j|))tj^ x = 00 ( equivalently, t^ 


°^b) 


(3.2) 
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Because of the following well-known fact (e.g. m Proposition 18.4]), all our counter¬ 
examples satisfy the product condition. 

Fact 3.2. For a sequence of irreducible aperiodic reversible Markov chains with relaxation- 
times and mixing-times if the sequence exhibits a pre-cutoff (either in total- 

variation or separation) and linv^oo = oo, then = o(tj^). 

Given z € Ll we let 

T z := infjf : X t = z} 

denote the hitting time of z. The following result allows us to characterize the mixing- 
time of the chain in terms of the hitting time of a given point which carries a positive 
proportion of the mass. As hitting times are sometimes easier to control than mixing-times, 
it will assist us in determining the total-variation profile of convergence to equilibrium in 
Examples [Tll3l 

Proposition 3.3. Let (Ll n , P n , ir n ) be a sequence of lazy reversible irreducible finite Markov 
chains which satisfies the product condition. Let us furthermore assume that there exists 
z n € Q n such that 

inf TT n (z n ) > 0. (3.3) 

n 

Then setting 

T n (p) := inf ( t : max P x [T Zn > t] < p\ , (3.4) 

^ xesi„ J 

we have for any e < e' € (0,1) 

t {n) (AT t [n) (e) 

limsup mix < 1 and liminf mix > 1. (3.5) 

n->oo T n (e) n-> oo T n (e') 

Note that in particular the result shows that total-variation cutoff occurs if and only if 
r n (-) displays the following abrupt transition 

Ve € (0,1/2], lim Tn ^~^ = 1. (3.6) 

n-s-oo T n (e) 

To characterize the separation time, we introduce a notion of “double-hitting time”. 

Definition 3.4. Given x,y and z in Q. We let Tf’ y denote a random variable obtained 
by taking the sum of two independent realizations of T z , once under P s and once under 
p y. That is, P [Tfi y = t] := Efc=o p *[ T * = k \?y[ T z =t~k}. 

Lemma 3.5. Let (Q,P,tt) be a finite irreducible lazy reversible Markov chain. Consider 
x,y,z € Q. 

(i) For all t > 0 we have that 

P\x,y)/^{y) > ]TP [Tfiy = kjP^iz, z)/n(z) > P [Tf* < t}. (3.7) 

k<t 

In particular, 

P\x,y)/-K[y) > P x [T y < t] (3.8) 

(ii) IfP x [T z < T y ] = 1 (i.e. if every path from x to y goes through z) then for allt> 0 

P\x,y)/*{y) = ]TP [Tfiy = fc]P*-*(z,z)/ tt(z) 
k<t 

< P [Tf’ v < t] + if re imaxP [T?’ y = k]y/(l - tt(z))/tt(z). 

I fceN 


(3.9) 






SEPARATION AND TOTAL VARIATION CUTOFFS 


11 


All our examples would be of sequences of chains whose spectral gaps are uniformly 
bounded away from zero, that is, ones satisfying 

inffl — An”)) > 0. (*) 

n 

Although this is not necessary, working with such chains substantially simplifies the 
analysis of our examples. To check this condition, we use the notion of the Cheeger 
constant and the well-known discrete analog of Cheeger’s inequality (13.101) [3, 31 21] (the 
proof can also be found at [161 Theorem 13.14]). 

Definition 3.6. For any (non-empty) set A £ we define 

Q(A) := ^2 ir(x)P(x, y) and 4>(A) := Q(A)/tt(A). 

x€A,y(£A 

We define the Cheeger constant of the chain to be 

:= min 4>(A). 

A:0<tt(A)<1/2 

We call a sequence of chains (Ll n , P n ,Tr n ) an expander family if inf n 3> n > 0. 

The following result implies that a sequence of reversible chains satisfies (j*j) if and only 
if it is an expander family. 


Theorem 3.7. Let A 2 be the second largest eigenvalue of a reversible transition matrix on 
a finite state space. Let be as in Definition \3.6l Then 

$ 2/2 < 1 _ A 2 < 2$. (3.10) 

It is rather straightforward to check in all of our examples that the Cheeger constant is 
bounded away from zero. 


Proposition 3.8. Let (Ll n , P n , ir n ) be a sequence of lazy reversible irreducible finite Markov 
chains which satisfies (FI) . Let us furthermore assume that there exist z n G Ll n , sets 
A n , B n C Ll n , with A n U B n = Q n \ {z n } and a n G A n , b n G B n , such that 

(i) inf n 7 T n (z n ) > 0. 

(ii) For any x G A n and y G B n , P x [T Zn < T y ] = 1. 

(iii) For all t 

max P x[T Zn >t\ = P an [T Zn > t] and maxP y [T Zn > t] = P bn [T Zn > t}. 

xeA n \jB n yeBn 


(iv) 

(v) 

Then 


lim inf inf min i AfiA _ QMX , > 


rn-oo t>0x,y£A n \ 7 T n (y) 7 T n (b n ) 


I™ maxP a „ [T Zn = k] = 0. 

n->-oo k> 0 


lim sup 

n ^°° t >0 


dW(f)-P[r^">f] 


= 0 . 


(3.11) 

(3.12) 

(3.13) 


Proof. We want to show that P^(x, y)/ff n (y) achieves its smallest value for (x, y) = (a n , b n ) 
up to a negligible correction. According to (iv) we do not need to worry about the case 
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when both x and y lie in A n . 
13.51 guaranties that 

Vt, V(x,y) € U? n \A 2 n , 


For the other cases, condition (in) combined with Lemma 


P*(x,y)My)>V[T%v<t] >P 



(3.14) 


Finally, applying Lemma 13.51 again yields that 
0 < " F \-T?:’ K <A< T^ei ma^P[T“™’ 6 " = k] y/(l - n n (z n ))/7T n (^)- (3.15) 

'KnxPn) Z 


k> 0 


This allows to conclude the proof by noticing that the right-hand side of (j3.15l) is o(l) 
(using (i) and (v)). □ 


Remark 3.9. We note that for lazy chains condition (v) in Proposition IRffl follows 
from the condition lim^^oo dist(a„, z) = oo (which is satisfied in Examples \MB), where 
dist(a n ,z) is the minimal k such that P k (a n ,z ) > 0. To see this, consider the non-lazy 
path the chain performed from a n to z by time T z , 7 = (70 = a n , 71,..., 7^ = z) (i.e. for all 
i < t, 7,71 7 ! 7 i and possibly after spending some time at 7 , the chain moved to 7 ,+iJ. The 
conditional law ofT z , given 7, is that of a sum of l independent geometric random variables 
with parameter 1/2), and so by the local CLT its mode is at most C/VI < C / Vdist(a n , z). 
Finally, note that the mode of a mixture is at most the maximal mode of a distribution in 
the mixture. 


4. Total-variation cutoff without separation cutoff examples 


In this section we describe two similar examples of sequences of reversible chains which 
exhibit total-variation cutoff but no separation cutoff. The analysis of both examples 
is extremely similar. We present both examples since while the first demonstrates that 
(11.71) is indeed sharp, it is much harder to transform it into an example of lazy SRWs on 
bounded degree expander graphs. 


Example 1. Given n> 2, set Q n := Au{z} U B where A = A n := {a = a n , a n _ 1 , ... , ai} 
and B = B n := {b\, b 2 , ■ ■ ■, & n -i> b n = b}. For notational convenience we write ao := 
z =: 60 • The matrix P n has positive transition rates on the set of (un-oriented) edges 
E = Ea UfigU -ELong, where 

E a := {e^ := {a fc ,a fc _i} : k € [n]}, 

E b ■■= {ef := {b k , b k -i,} : k € [n]}, (4.1) 

-^Long := {e\ ■■= {z,b k } : k E [n]}. 


With a small abuse of notation we define e\ and ef to be two distinct parallel edges. To 
each of these edges, we associate conductances (or weights), as follows 

(ef), for all fee [n]. 

= 3 • f° r k€[n- 1 ], and w n (e]() = . 

We let P n be the transition matrix of the (1/2)-lazy random walk on the graph (Q n ,E) 
with conductances w n , i.e. we set 


Wn\eu) = 


W n er = 


= 2~ k = w„ 


Wn(e^)+W n 


n—1 


Pn(x,x) 

P n (x,y) 


1/2 for all x € Cl n , 
w n (x,y)l x ^ y 
2w n (x) 


(4.2) 
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where w n (x) := e5 - i n Wn ( x ,y) with the convention that w n (z,bi) 

This Markov chain is reversible with respect to 


K n {x) 


w n (x) 

E v en n w n(y)' 


w n (e\) + w n (ef). 



n 


n 


Figure 3. A schematic representation of the transition rates for Example [T] On the 
segments A and B the transition rates away from and towards the center of mass z are 
equal respectively to 1/6 and 1/3 (on the A side) and (n — l)/6n and (n — l)/3 n (on 
the B side). The rate for using a green-edge to land on 2 is equal to 1/2 n. The rates for 
using green edges in the other direction has a more complicated expression prescribed 
by reversibility. These rates are described below despite the fact that they play no role 
in our analysis. 


A simple calculation show that 


w n (z) = 1 + 


3 _ 2-( n - 2 ) 


2 (n — 1 ) 

Q _ o—(n—2) 

£ My) = 4(1 - 2 _n ) + 

yGfln 


(4.3) 


2 (n — 1 ) 


which implies lirrij^oo n n (z) = 1/4. The transition matrix obtained from w n is 

• P n (x,x) = 1 / 2 , for all x € Cl n . 

• Pn ( a n i O j n— 1) — 1/2' 

• 2 P n (ai,a i+ i) = 1/3 = P n (a;,aj_i), for all 1 < i < n. 

• P n (bi,z) = for i > 2 . 

• P n (bi,bi- 1 ) = 4(1 - 4) = 2 P n (bi, b i+ i), for all 2 < i < n - 1. 

• Pnibni bn— l) — 2 2n' 

• P n (b!,z) = ± + 4(1 - 4) and P n (b 1 ,b 2 ) = 4(1 - 4). 

• P (■y — 2n±l _ l+o(l) „ rl ] p ( y n \ _ »~1 _ 

• Pn { Z , Ol ) ~ 4(2n+ i_2-(»-a)) “ 4 and ^ n { Z , a \) - 4 n+ 2 _ 2 _ ( „_i) — 4 ■ 


Pn ("j bk ) — 


_ 3-o(l) 
n2 fc + 2 5 


for 2 < k < n — 1 , 


2 fc + 1 (2n+l-2-(' 1 - 2 )) 

and P n ( Zl b n ) = 2 rt(2n+1 l 2 - (rt - 2)) - 

Note that for this chain, condition (j*j) is easily verified using Theorem 13.71 Since under 
P a „, T z is concentrated around time 6 n, to prove total-variation cutoff around time 6 n 
for this sequence of chains (using Proposition 13.3|) . we only need to verify that a n is the 
initial state from which T z is (stochastically) the largest. A crucial fact which shall assist 
us in this task is that for all i € [n] and all t 

PbjT/ > t] < P ai[T z > t]. 


(4.4) 
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The reason for this identity is the following: We couple X A and X B starting from a* and 
(resp.) in the following manner: with probability 1/2 both stay put, with probability — 
4-) X A and X B make “the same move” (+/ — 1 (towards/away from z) with (conditional) 
probability 1/3 and 2/3, resp. (unless the current position of the chain is either a n or b n 
in which case the move has to be —1) and with probability l/(2n), X B is sent directly to 
z while X A moves towards/away from z with probability 2/3 and 1/3 (unless it is located 
at a n ). We do not need to specify how the coupling is defined after X B has hit z. 

A way to describe Xt starting from B before it hits 2 is the following: at each step it 
is killed (hits z) with rate l/(2n) and conditionally on not being killed, it performs “the 
same” random walk as that on A (in terms of the index of its current position) but with 
holding probability n/(2n — 1) > 1/2. 

Consequently, 

max P y [T z >t] = max P y [T z > t] = P a „[T z > t], 
y&A n 

maxP y [T z >t} = P bn [T z > tj. 

y&B n 

Moreover, it follows from the above discussion that 

p a n [T z >t](l- j < P bn [T z >t]< min \ P a jT z > t], ^1 - 


(4.5) 

(4.6) 


We now turn to the task of verifying that there is no cutoff in separation. Note that 
conditions (i)-(ii) of Proposition l3(8l hold by construction, condition (v) holds by Remark 
13.91 while condition (iv) holds by (13.811 . Lastly, condition (m) of Proposition lT8l follows 
form (14.5p . and so Proposition 13.81 applies. Consequently, 

lim sup | (M(t) - > t]\ = 0. (4.7) 

n ^-°° t >o 


Set m n := [n 2 / 3 ] (the exponent 2/3 can be replaced by any number in (1/2,1)). It is 
standard to check that 

lim P an [\T z - 6n| > m n \ = 0, 


n—> oo 

lim sup |P b n {T z >t + m n ) - P bn {T z > t)| = 0. 

n ^°° t£[0fin-2m„] 

Hence it follows from (14.71) that for all c € (0,6) 

lim |4ep( 6n + L cn J) - Pb n [T z > cn]\= 0. 

This and ()4.6I) yield that for any 0 < e < 1/4 

'1 

linr 4ep(L sre J) = 


e -(s-6)/2 if s g [6,12), 

0 if s > 12. 

Hence there is no separation cutoff. Moreover, 


if s < 6, 


sup lim inf ( cp - 

o<e<i/2 


= 2 . 


(4.8) 


(4.9) 


(4.10) 


We now describe a variant of the previous example which is a nearest neighbor lazy 
weighted random walk on a bounded degree graph with bounded transition probabilities. 
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Example 2. Let Ll n = iU5UC'U {z}, where 

A A n . {ai, 0>2 j ■ ■ • j ®2n j 

B = B n := { 61 , 62 ; • ■ • ■ 6 2n = 6 }, (4.11) 

1 C n • {ci, C 2 , . • ■ , C n _i}. 

For notational convenience we write ao := z =: bo = Co and c n = b n . Consider the 
following transition matrix 

• P n (x, x) = 3/4 /or all x G fl n \ C and P n (ci,ci) = 1/2 for all i € {1,..., n — 1}, 

• Pn{p , 2m Q'2n— l) — 1/4 — Pn(p2m ^2n— l); 

• 2P n (ai,a i+ i) = 1/6 = P n (a;,ai_i), for all 1 < i < 2n, 

• 2P n (6j, 6 j + i) = 1/6 = P n (bi, bi-i), for all i G [2n - 1] \ {n}, 

• 2P n (a,c i+ i) = 1/3 = P n (ci,Ci- i), for alll <i <n - 1, 

• Pn(b n , 6 n _|_i) — P n {b n , Cn-i) — P n (b n ,b n -\-i) — 1/12, 

• P n (z,ai) = P n (z,bi) = P n (z,ci) = 1/12. 


1/6 1/3 



Figure 4. A schematic representation of the transition rates for Example [2] for n = 4 . 

When at a state of degree two or three (other than z), conditioned on making a non- 
lazy step, the chain moves away from (resp. towards) z with conditional probability 1/3 
(resp. 2/3). For vertices of degree 2: along the green edges, rates away from and towards 
the center of mass £ are equal respectively to 1/12 and 1/6 and along the red edges they 
are equal to 1/6 and 1/3, respectively. The transitions away from vertices of degree 1 
and 3 are given on the figure. 

States 02 n , b 2 n and z play here the same respective roles as a n , b n and z in the previous 
example. A simple calculation (similar to (I4.3|) 4 yields that 

lint 7 T n (z) = 2/7. 

n—/oo 

We argue that for all t > 0 and i € [2n] 

max(P Ci [T z > t],F bi [T z > /]) < P ai [T z > t] < F a2n [T z > t\ (4.12) 

In particular, 

Vf > 0, max P X [T Z > t\ = P a2n [T z > t\. (4.13) 

X€iln 

Since the hitting time of z under P a2n is concentrated around time t = 24 n, by Proposition 
13.31 the sequence exhibits total-variation cutoff around time 24n. 

The last inequality in (I4.12p is trivial. For the first one we consider the case where P n 
is replaced by Pf which satisfies 2Pf(ci, Cj+i) = 1/3 = Pf(ci,Ci- 1 ) and Pf(ci,Ci ) = 1/2 
for 1 < i < n — 1 and P'(x,y) = P(x,y) elsewhere. As adding extra laziness increases 
stochastically the hitting time T z (as in Remark 13.91 consider the law of 7, the non-lazy 
path performed by the chain by time T z \ Clearly it is invariant under this transformation, 
while the conditional law of T z , given 7, can only increase, stochastically), 

PftiPz > /] — P&ilXz > = P ai[P z > t], 


(4.14) 
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(where P' denotes the distribution of the modified chain with the increased holding prob¬ 
ability on C n ) and the same holds when bi is replaced by q. 

To prove that & 2 n is the vertex from which the hitting time of z is the largest, we need 
to prove the following two inequalities valid for i € {1, ..., n} 

P a[T z >t]< P ^ [T z > t] and P h , [T z > t] < P b i+n [T z > t). (4.15) 

Both can be proved by coupling arguments. For the first one, we can couple the non-lazy 
path of the chains starting from bi and Cj until they reach either b n or z (the second being 
at position Cj when the first is at position bj ), and then in the case they reach b n = c n 
let them evolve together until they reach z. The larger laziness on the path starting from 
Cj until the merging time, implies stochastic domination. For the second inequality, the 
case i = n follows from fact that starting from & 2 n the chain has to go through b n before 
reaching z. For i < n, we can couple the chain starting from bi and &j +n until the pair of 
chains reaches either (b n , & 2 n) or (z, b n ) (the second chain being at position bj+ n when the 
first is at position bj), and conclude using the case i = n. 

As in the previous example, we can apply Proposition 13.81 The reason why separation 
cutoff does not occur is that when starting from & 2 n> the hitting time T z is not concentrated. 
Indeed it is concentrated around 18n under the conditioned probability measure P& 2n [- | 
Xt z ~i = ci], while it is concentrated around 24n under P& 2n [- | Xt z -i = b\], As by 
symmetry 

F b 2 „ [Xt z -i = ci] = P fe2n [X Tz - i = &i] = -, 

this yields 

Lemma 4.1. We have 

1 if s < 18, 

1/2 ifs € (18,24), (4.16) 

0 if s> 24. 

While this result is rather elementary (we use some surgery to compare T z with a 
sum of independent variables, and then the law of large number for this sequence), the 
proof in full detail is long to expose (c.f. [5] Example 8.1]) and we choose to leave it as 
an exercise. Applying Proposition 13.81 for an adequate choice of sets and states (here 
{a 2 ni & 2 n, A n , B n U C n ) plays the role of (a n , b n , A n , B n ) from Proposition 13.81) yields 

1 if s < 42, 

1/2 if s£ (42,48), (4.17) 

0 if s > 48. 

In particular, there is no cutoff in separation. 

5. Separation cutoff without total variation cutoff example 

In the following example the analysis of the sharp transition of dscp(t) is reduced to the 
analysis of the behavior of sum of i.i.d. random variables in the large deviation regime. The 
analysis below is too coarse for the purpose of determining the width of the cutoff window. 
We later present a refined analysis for Example [5] (which is the bounded degree un-weighted 
version of Example [3]) in § 16.51 which shows that in fact tscp(e) - tsep(l - £■) < C\ logs I, 
for some absolute constant C > 0. The analysis in § 16.51 is built upon the analysis of 
Example [3] below, as it relies (in a non-quantitative manner) on the fact that certain large 



lim P;, 2 

n—>■ co 


, [T z > sn] = < 
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deviation estimates hold uniformly over compact sets (the identity of the large deviation 
rate function is not important for the analysis in § 16.511 . 


Example 3. Let M > 10 be a fixed integer whose exact value shall be determined later. 
Consider the state space Ll n = A U B U {z} U C U D U {z'}, where A = A n := {aMn = 
Rj (iMn—ii ■ ■ ■ j dB — B n . — \pMn — b, bMn —11 • • • > b\C — C n . — {ci, C 2 , • • •, c n _i} and 
D = D n := {d\, c? 2 ? • • •; d n -\}. We use the following notational convention: ao = &o : = 
z' =: c n = d n and do := z =: cq. Consider the following transition matrix 


Pnifj i) — 


3/4 i € C, 

1/2 otherwise. 

— 1/2 — 

P n (z,ci) = 1/4 = P n (z,di). 

P n (z',c n ) = P n (z',d n ) = 1/6 = 2P n {z',a 1 ) 
PnifliWi— l) — Pnipiibi— i) — P n (dj,dj—\) - 
Pn(fli j O'i+l) — Pn(fii,bi-\- 1) — P n (dj, dj+x) - 
for all i € [Mn — 1] and j € [n]. 


= 2P n (z',b 1 ). 

2Pn(pj j Cj—l) 
2 Pnipj , 


= 1/3. 
= 1 / 6 , 


1/2 



Figure 5. A schematic representation of the transition rates for Example [3] When at 
a state of degree two or four (other than z), conditioned on making a non-lazy step, the 
chain moves away from (resp. towards) z with conditional probability 1/3 (resp. 2/3). 

The transition rates away from and towards the center of mass z, from degree two states, 
are equal respectively to 1/6 and 1/3, except on the segment C, due to increased holding 
probability. The transition rates away from the rest of the states are specified in the 
figure. 

This chain is a modification of Aldous’ example (which was discussed in § El). The 
difference lies in the introduction of an additional branch B to the graph. This branch 
has no effect on the total-variation profile of the convergence to equilibrium, but crucially 
modifies the separation profile, as Pf l {a 1 b)/ , K n {a) (recall a := a n M and b := b n M ) is the 
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quantity that takes the longest time to reach equilibrium (i.e. up to negligible correction 
(x,y) = ( a,b ) maximizes 1 — P/(x, y)/7r n (y) for all relevant t). 

A standard calculation yields that 

lim 7 T n {z) = 2/11 and lim 2 n 7r ri (V) = 6/11. (5-1) 

n—>• oo n—> oo 

By symmetry, the law of T z starting, resp., from cq and bj is identical for all i and by the 
Markov property, it is stochastically increasing in i (for i > j, to reach z from ai (resp. b%) 
the chain must first hit aj (resp. bj)). Only minor efforts are necessary to prove rigorously 
that a and b are the points in A U B U D for which the hitting time T z is stochastically the 
largest (the coupling arguments are similar to the one developed in the previous section), 
while for any choice of M > 1, 

lirrisupsup(max P C [T Z > t] — P a [T z > £]) < 0. 

n— >oo t cGC 

Due to the different holding probabilities along the two branches, C, D, the distribution 
of T z under P a is not concentrated around its mean. Thus, by Proposition 13.31 there is no 
total-variation cutoff, and the total-variation asymptotic profile is given by 

0 if s < M + 1, 

1/2 ifsG (M + l,M + 2), (5.2) 

1 if s > M + 2. 

To show that there is separation cutoff, it suffices to prove that 

liminfinf min P*(x, y)/n n (y) - min (l, P*(o, b)/ir n (b)) = 0, (5.3) 

n—too t x,y£fl n 

and to show that min(l, P* (a, b)/n n (b)) displays an abrupt transition. Let us start with 
the second point. According to Lemma [331 (first inequality of (13.71) 1. we have 

P [T a / = t]/7r n (z') < P*(a, b)/7r n (b) < P [T a / < t]/n n (z') (5.4) 

By definition T“,’ 6 is the sum of two independent hitting times of a biased random walk 
on a segment of length Mn (from one end-point towards the one towards which there is a 
bias). We make some efforts to compute the large deviation behavior of this sum. 


lim d^(6sn) := < 

n—>■ oo 


Lemma 5.1. Consider a lazy random walk (Zt)t> o on Z + with rates p(x,x + 1) = 1/3, 
p(x + 1, x) = 1/6, x € Z_|_. Let Tn be the first hitting time of N . We have 

lim -i-logP[77v = \ sN\] = lim ^-logP[T/v < sN] = —T(s), for s € [1,6] 

W—>-oo N JV—>-oo N 

lim ^ logP[7)v = [sN\] = lim logP[T/v > sN] = —'k(s), for s > 6, 

Af^-oo N W->-oo N 

where T is the following Legendre transform 


T(s) := sup [As — logF(A)] 

Ag(— oo,oo) 


where 


f(A) := 


oo 

6 e “ A -3 y / ( 6 e _ A — 3) 2 —8 

. 2 2 


*/ A>log(6/(3 + 2\/2)), 
i/A<log(6/(3 + 2V2)). 


Moreover, 'L(6) = 0 = T'(6) and the second derivative 'L ,/ (6) is positive. 


(5.5) 


(5.6) 
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Proof. Let X' be the random walk with the same rates on Z, and T' N be the first hitting 
time of N for this walk. By the Markov property Tfj is the sum of N IID copies of T{ and 
hence we can use Cramer’s Theorem (see e.g. [SJ Chapter 2]) to obtain the large deviation 
for Tff below its mean. If one decomposes according to the value of X[ we notice that the 
Laplace transform f(A) := E[e A7_1 ] satisfies 

f(A) = e A (i + t F (A) 2 + ip(A)) . (5.7) 

and we deduce the right value for f(A) from this relation (the fact that f(0) = 1 and 
continuity of F indicates which root to choose in (15.71) 1. Note that the derivative of 
logF(A) at zero is equal to 6 which implies that 'k(G) = 0 (Alternatively, E [T{] = 6, hence 
by Cramer’s Theorem it must be the case that T(6) = 0). As Ik is non-negative (since 
logF(O) = 0), it must be the case that it attains a global minimum at 6, which implies 
that T'(6) = 0 and T"(6) > 0. 


Now, note that Ti — Ti—i are independent variables, which are dominated by T[ and who 
converge (when i tends to infinity) to T{ in law. In particular, by dominated convergence 
(and Cesaro’s Theorem) we have that for any A € (—oo,log 3+ ^^ ], 

lim logE[e A7 ^] = f(A). (5.8) 


and thus in that case the result follows from Gardner Ellis Theorem |8j- Finally, the local 
large deviation estimate (the result on P[7jv = L S -^JD can be deduced from the large 
deviation principle using the fact that due to laziness 


[Tn — t + 1] > 1 
[Tn = t\ ~ 2 


We leave it as an exercise. Note moreover that the convergence (15.51) holds uniformly on 
s € K for any compact K (it can be deduced e.g. from (15.91) 1. □ 


A consequence of (15.41) and the previous lemma in conjunction with (15.11) and Lemma 
13.51 is that if sm is given by 2 Ms*, where s* is the unique solution in (0,6) of 

2Mf (s) = log 2, (5.10) 

then 


lim 

71—> OO 


Pi sni (a,b) 

TTn(b) 


0, if s < sm, 

oo, if s € ( sm, 12 M], 


(5.11) 


An order 2 Taylor expansion of (15.101) around s = 6 readily shows that 6 — s* = 0(1 /y/M) 
(i.e. 12M — sm = @(\/M)) for large M. In particular, sm > 11 M for M sufficiently large. 
What is left to do in order to prove separation cutoff is to check that for any s € (sm , 12M] 
(in fact, by monotonicity it suffices to consider only s arbitrarily close to sm ) we have 

liminf minPL sn J ( x ,y)/ 7 r n (y)) > i. 

n—>-oo x,y 

In what follows we let s € {sm , 12 M] be fixed. 

We first use Lemma [3.51 to reduce to the case of x = aj, y = bj, i,j > Mn/2. Set 
E := {a* : i > U {bi : i > Hfp 1 }. By (13.71) for any x € Pl n and y £ Q n \ E we have 


y) 

Kn{y) 


> P [Tf’ y < 11 Mn\ 


(5.12) 
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and it is a simple exercise to show that (when M is sufficiently large) 

lim min P \T/ ,y < 11 Mn] = 1. (5.13) 

n ~>°° (x, y )en n x(n n \E) 


Finally, to treat the case x = a*, y = bj (the cases ( ai,a,j) or ( bi,bj ) are treated in the 
same manner), i,j > Mn /2 , we use again Lemma 13.51 which asserts that 


P^fabj) ^ f ¥ [ T ^ = [snj] 

- - max -- T7\ - 

VT nipj) y 7 T n (z') 


P [ T y bj < [sn\ 


(5.14) 


Note that T^’ bj is (cf. the proof of Lemma r5.il) a sum of i+j independent random variables 
(not identically distributed) and that 


\r r Ui n u j 

lim sup -- 7 logE[e *' 

n ^°° i,j>Mn/2 l +3 


logF(A) 


= 0. 


(5.15) 


One deduces from Gardner Ellis Theorem [8] and the following consequence of laziness 


E {T“ i,bj =t + 1] 
E [T“ i,bj = t] 


> 1 / 2 , 


(5.16) 


that for any u € (l,oo), 


lim sup--—logP[T//’ &J 

n ^°° i,j>Mn /2 1 + 3 


L(* + j>J] - ®(«) 


= 0 , 


(5.17) 


and the convergence holds uniformly on compact sets. Now let us fix r] which satisfies 




6s 


s + Qrj 


< 


log 2 
AM ’ 


Using (15.171) . there exists <5 such that for all n sufficiently large , for all i,j > Mn/ 2, 


STl 

i+ j < — + rin 

STl 

i + j > — + r]n 


P [T a z r b] < |_snj] > 1 - e 


—Sn 


logP {T“r bj = |sn|] > -(i+j) 


, STl . 

'L --:) +6 

1+3 


(5.18) 


Note that the l.h.s. in the second line satisfies 


(* + 3 ) 


*(^) + 5 

i + j 


< 2 Mn l^rnax 




(m) 




6s 


s + 6r] 


+ 6 . 


(5.19) 


As 2M+ (gfj) < log2 (since s € (sm, 12M]) and 6 can be chosen arbitrarily small, 
(15.181) (second fine) and (15.191) imply that for sufficiently large n, for any i.j satisfying 
i + j > + rjn, we have 

P [Tp ,bj = |_sn_|] > 2~ n{ - 1 - s \ (5.20) 

Combining this with (|5.18|) (first line) and (|5.14|) we can conclude that 


min 

i,j>Mn/2 


P ra LsnJ (q i ,fe j ) 

TTn(bj) 


> 1 - e~ Sn . 


(5.21) 
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5.1. Concerning Remark 11.31 Note that by performing a minor modification in the 
above construction we can bring the pre-cutoff ratio for total-variation to the largest 
possible value: 2. A way to achieve this is to make one of the branches linking z! to 2 
much faster than the other (instead of only twice faster as in Example 3, we want the 
ratio of speeds to tend to infinity). 

What we can do is to make these branches of length \y/n ] while A and B are of length 
n. Furthermore, we choose the speed on one branch to be 1/6 while that one the other 
being 1/(6 y/n) by increasing the holding probability on this branch (see Figure [HI • Using 
similar reasoning as in the analysis of Example [3] one can show that for this construction 
there is separation cutoff around time 12n (note that here — log n n (z') = which 

by (f5Al) implies that for t n := ("(12 -e)n], Pf l n (a,b)/-K n (b) < P[T“/ 6 < t n ]/Tr n (z') = o(l), 
for every £ > 0). 

We can also find a similar example with transition rates bounded uniformly from zero 
by considering two branches of different lengthes, but in that case the analysis turns out 
to be more intricate. 



Figure 6. A modification of the graph size and of the holding probability along the 
slow branch as shown above yields a counter-example with cutoff in separation and the 
maximal possible pre-cutoff ratio 2 in total-variation. 


6. Transforming Examples [2] and [3] into lazy simple random walk on 

BOUNDED DEGREE EXPANDER GRAPHS 

6.1. General comments and preliminaries. In this section we transform Examples [2] 
and [3] into lazy SRWs on a sequence of bounded degree expander graphs. For this kind of 
walk, the equilibrium measure is ir(v) = yP ic |/ g u , and thus no particular vertex can have 
the role of the “center of mass” as in the previous examples. Let us rewrite the definition 
of the Cheeger constant in this context. 

Definition 6.1. Let G = (V,E) be a finite connected graph. For every S C V denote 
Cs ■= degu. For any S C V we define its edge boundary 8eS to be the collection 
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of edges having one vertex in S and the other in V \ S. The Cheeger constant of lazy 
simple random walk on G is defined as 

diLazy {G) := min \d E S\/2Cs, 

S:7t(S)<1/2 

which coincides with Definition ^. (A (see e.g. ca Remark 7.2]). We say that G is a c-lazy 
expander */chL aZ y(G) > c. We say that a sequence of finite graphs (G n ) n >i is a family of 
c-lazy expanders if inf„ diLazy^n) > c. 


In our new context, the center of mass is rather a set which contains a positive fraction 
of the vertices. We shall relate the mixing-time of the chain to the hitting time of this set. 
Mutatis mutandis , the results of Section [3] and in particular Lemma [3.51 can be adapted to 
this new context, but only if the set and the starting point satisfy a special relation: 


Definition 6.2 (Balanced sets). For any Z C we denote the hitting time of Z by 
T z := inf(t :X t eZ}. 

• We say that Z is balanced seen from x € if for all t such that P x [Tz = t] > 0, 

\/z€Z, P x [X t = z \T Z = t\ =tt z (z), (6.1) 

where ir z(-) = l ' e K Z (z)' > n conditioned on the set Z. 

• We say that Z is balanced seen from the set A if it is balanced seen from x for all 

x € A. 

• We define T^ ,v to be a random variable distributed like the sum of two independent 
realizations ofTz, once under ~P X and once under P y . That is, for all t > 0, 

t 

P [T x z ' y = i] := 2 P X [T Z = k]P y [Tz = t - k\. (6.2) 

k =0 

Note that sets are not likely to be balanced by “pure luck” and we will be careful to 
introduce a sufficient amount of symmetry when constructing our graphs, so that our 
center of mass will be balanced seen from many starting points. However, this property 
cannot be satisfied for all starting points and we will have to deal with the remaining 
initial vertices separately (and show that they are irrelevant for determining the worst- 
case total-variation and separation distances), by using a crude 1 2 estimate (Lemma 16.8(1 . 


Lemma 6.3. Let (Ll,P,ir) be a finite irreducible lazy reversible Markov chain and consider 
x, y € Ll, and Z € D which is balanced seen from both x and y 
(i) For all t > 0 we have 


P l (x,y)/n(y) > ^P[T^ = fc]P^ fe (Z)/7r(Z) > P [T% v < t]. (6.3) 

k<t 

(ii) IfP x [T z — 1 ^ 2 . &• xj G'UG'ry path x to y yoos thT'ouyh tho sot ZJ then foT' 
all t > 0 we have that 


PHx,y)My) = Y J ^ T z y = *]P l~ z k (z)Mz) 

k<t 

< P [T x z ' y < t] + ^t re imaxP[T|’ y = k\^/(l - ir{Z))/ir{Z). 
Z ke N 


(6.4) 


We use this result directly but also to prove the following key propositions whose aim 
is to replace Propositions 13.31 and 13.81 
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Proposition 6.4. Let (fi n , P n , 7r n ) be a sequence of lazy reversible irreducible finite chains 
which satisfies the product condition. Assume that for each n there exist sequences of sets 
and vertices I n , Z n C Ll n , a = a(n) € Ll n which satisfy 

(i) inf n n n (Z n ) > 0. 

(ii) Z n is balanced seen from I n for all n. 

(iii) limsup,,^ sup t > 0 max,; 6 /„ P i[T Zn > t] - P a [T Zn > t] < 0. 

(iv) limsup n _ wo sup t > 0 max xe n„\ 7n ||P* - it\\tv ~ P a[T Zn > t] < 0. 

Let T n (p) : = inf{t : P a [T Zn > t] < p}. Then 

fi n ) (s') t (n) (e) 

lim sup mix 7 < 1 and lim inf mix > 1, for all 0 < e < s' < 1. (6.5) 

n-s-oo T n {£) n-> oo T n {£') 

In particular, total-variation cutoff occurs if and only if 

T fs - ) 

lim — j -——- = 1 , for every 0 < e < 1 . (6.6) 

n—>oo T n ( 1 - £) 


Proposition 6.5. Let (f l n , P n ,ir n ) be a sequence of lazy reversible irreducible finite Markov 
chains which satisfies (j*j). Assume that there exist sequences of sets and vertices, A n , B n , Z n C 
fl n , a n £ A n , b n € B n , which satisfy 

(i) inf n 7r n (Z n ) > 0. 

(ii) lim^oo max fc P Qn [T Zn = k\ = 0. 

(iii) Z n is balanced seen from I n := A n U B n . 

(iv) P x [T Zn < T Bn ] = 1 for all x € A n . 

( v ) 

lim sup sup max P y [T Zn > t] - T bn [T Zn > t] = 0, 

n—»oo t >0 J/SB n 

lim sup sup max P y [T Zn > t\ - P Qn [T Zn > t\ = 0. 

n^-OO t>0 y£ln 


(vi) 


Then 


lim inf inf min n ’ 

n^oo t>0 (x,y)GA2 U(f22 \/2) 7 T n (y) 


P n ( a n) bn) 

T n (fin) 


mn sup|dg(t)-p[r^> 6 ~>t]| = o. 


( 6 . 8 ) 


In particular, there is separation cutoff if and only if is concentrated around its 

median. 


Remark 6.6. Note that the results presented above are generalizations of those presented 
in Section 0 Hence we shall only prove the more general versions in the Appendix. 

Remark 6.7. Similarly to Remark \3.(A condition (ii) of Proposition 00 is satisfied in 
Examples 0 and{5\ due to laziness and the fact that lim™-^ min{t : Pf(a n , Z n ) > 0} = oo. 

Lemma 6.8. For any reversible Markov chain, (Ll,P,ir) and any x,y € Ll and s,t> 0, 

P s+ \x, y)/n(y) > (1 - ||P* - P^||tv) 2 . (6.9) 

In particular, if d x (t) + d y (s) < 1, then 

P s+t (x,y)/Tr(y) > (1 - d x (t) - d y (s)) 2 . 


( 6 . 10 ) 
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Proof. Let f(z ) := \/P t (x, z)P s (y, z)/n(z). By reversibility and Jensen’s inequality, 


P s+t (x,y) ^ r \P t ( x i z )P S ( z ->y) t \th \ \ ( \tt 

^ Ls — Tjvwwvi— = Z. n ^ f & ^ L. 


n(y) 


n(z)ir(y) 
2 


> (^2^H p \x,z),P s {y,z)) S j = (1- ||P^ — P^Htv) 2 - 
(16.101) follows from (16.91) by the triangle inequality. 


□ 


6.2. Building blocks of our constructions. Let us now describe the building blocks 
of our constructions. We assume for simplicity that n is an even integer. To produce 
the analog of a biased nearest-neighbor random walk, our constructions must include 
structures which look like regular trees (for which the SRW has a bias towards the leaves). 
We must also care about adding some extra connections to avoid producing dead-ends 
on the leaves (which could lead to a small Cheeger constant). Finally, we must introduce 
extra symmetries to ensure that the center of mass is balanced seen from all vertices which 
are sufficiently far from it. Finally, we “stretch” the edges which are far away from the 
center of mass (that is, replace each such edge by a path of length L, for some fixed large 
constant L), to ensure that the worst-case total-variation and separation distances are 
obtained by vertices which are far away from the center of mass (which is balanced, seen 
from those vertices). 

Step 1: Let T a = (V a ,E a ) be a binary tree of depth n rooted at a (in the rest of the 
construction, we keep calling a the root , even though the graph will no longer be a tree). 
Replace each edge between a pair of vertices belonging to the first n/2 generations of T a 
by a path of L edges, where L is an integer which does not depend on n. As L shall remain 
fixed we omit the dependence in L from our notation. In the course of the proof we will 
have to require L to be sufficiently large for the purpose of applying a certain crude i 2 
estimate. We call the obtained graph H^. It is a tree rooted at a and we denote its set of 
leaves by 

/I { ? n \ 

C n := (u ,u ), 

(J2 V stands for the n-th generation of Ta), where the labels are chosen in an arbitrary 
fashion. 

On Hi the walker starting from a will have a bias towards the set of leaves, which can be 
considered as the center of mass of these graph, since it contains a positive proportion of 
the vertices. The parameter L here is present only to make the walk slower (the expected 
number of steps to cross an L-path is 2 L 2 , i.e. if v € H ^ is either the root a or a vertex 
of degree 3 adjacent to three degree 2 vertices E„[inf{t : D(X t ,v) = L}] = 2 L 2 where 
D denotes the graph distance). This shall assist us in verifying that the worst-case total- 
variation and separation distances are obtained by vertices which are far away from the 
center of mass. 

The problem of this construction is that seen from a vertex which is not a the set of 
leaves is not balanced. To cope with this defect, we add n extra ’’generations” of vertices, 
which make the center of mass balanced from ’’many” starting points. 
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Step 2: For all 1 < m < n we label the vertices of the ” n + m-th generation” (they are 
at distance (L + l)n/2 + m from a) as follows 

Cn+m [4 ],k £ [2 n ~ m ]} 

and we connect them to generation n+m—1 using the following scheme: for all k € [2 n ~ m ] 

A are connected to u 2k and uj k _■ 

— 1 ,^* Lm — 1 ‘lr-rm-1 

of H 2 is the set £ 2 n (it bears 




We call the obtained graph H™. The ’’center of mass 

roughly half of the total mass of H 2 ), which is balanced seen from any vertex in . 

Step 3.1 and 3.2: We now want to plug (attach) to the leaf set of “two paths” with 
different speeds (to have something similar to the structures present in Examples [2] and 
[3]). The construction is the following (see Figure [7]): 

(i) We start with a rooted binary tree T of depth n (assume n > 4). And let us call 
1 and 2 the two neighbors of the root and 7\ and T \2 the subtrees rooted at 1 and 
2, respectively. 

(ii) In T\ we add edges between any pair of vertices which have a common ancestor 
and are not leaves. 

(iii) Finally we assign labels to the leaf sets of T\ and T 2 in a way that the two labeled 
trees (prior to step (ii) that is) are isomorphic (see e.g. Figure [TJ) and we merge 
each leaf of T\ to the leaf of T 2 with the same label. We let T n denote the obtained 
graph. 

(iv) We let 7^ denote the graph which is obtained by the same construction, in which 
we also add edges within T 2 in step (ii) using the same role as for T\ (see Figure 

0 . 

To each vertex v G £ 2 ™,, we glue a copy of T n (v is merged with the root of T n and we 
obtain Hn' 1 ). If we glue a copy of 7^ (to each v € £ 2 n,) instead of T n , we obtain Hn’ 2 . For 
both graphs we call £^ n the set of vertices at distance (L + 5)n/2 (i.e. maximal distance) 
from a. 


9 10 11 12 13 14 I 


011 12 13 14 15 




Figure 7. Representations T n (on the left) and Th (on the right) for n = 4. The red 
edges are those added in step (ii). On step (iv) leafs with the same label are merged. 


Finally we want to link together all the vertices of £^ n in order to avoid dead-ends in 
the graph. We choose to link them together using an explicit expander (see e.g. 0 (2D| 
for examples of explicit construction of expanders) so that (total-variation) mixing occurs 
rapidly once £s n is reached. 
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Step 4 : We let F n = (V n ,E n ) be a family of explicit 3-regular c-lazy expanders with 
V n = [2 3n_1 ]. We glue together G n and H?,' 1 (i = 1,2) without adding vertices by identi¬ 
fying V n with £3^-1. More precisely, we start with a copy of Hn’ 1 with root a. We label 
the vertices of £3™ by z\,, z 2 3n-i (the labeling is arbitrary). We then connect Zi with 
zj if and only if {i,j} € E n . We call the final result of our construction Hn (i = 1,2). 
We call a the root of Hn(i = 1,2). 

With some efforts and using the tools developed in the following sections, the reader 
can check that the lazy SRW on Hn 1 exhibits pre-cutoff but not cutoff in total-variation. 
This is a SRW version of Aldous’ counter-example. 

6.3. A sequence of Lazy SRW on bounded degree expanders with total-variation 
cutoff and no separation cutoff. The following is a modification of Example [2] into a 
sequence of lazy SRWs on a sequence of bounded degree graphs. 

3 1 3 2 

Example 4. Take a copy of H n ’ with root h and a copy of H n ' with root a. We glue 
together the two by merging the vertices of L^ n (of both graphs): we give labels zi ,..., z 2 3n-i 
to the vertices lying in of each of the two graphs, and then merge each pair of vertices 
who share the same label. Finally, we build extra-connections between z\,... ,z 2 3 n-i using 
an expander graph F n with 2 3n_1 vertices, like in Step f. We let G\ := (V^E 3 ) denote 
the obtained graph. 


Strectched edges 


81 


Strectched edges 



H 2 


H 2 


H 


3,2 


H 


3,1 


Figure 8. Schematic representation of Example [4] In the construction of H'f 1 , the 
asymmetry of T n produces two different paths to reach the center of mass Z n , with 
different speed. This produce an absence of concentration for the hitting time of Z 
starting from b. 

In order to apply Propositions 16.41 and 16.51 we need to identify which vertices and sets 
will play which role. 

• The center of mass Z n is given by the 2 3n ~ 1 vertices which are linked by the 
expander. 

• a is the vertex which maximizes (stochastically) the hitting time of Z n . 

• The pair of vertices (x, y) which (up to negligible terms) attains the minimum for 
Pf(x,y)/TT n (y) (for all i > 0) is given by (a, b). 

• The sets A n and B n are chosen to be the largest set of points around a and b 
(resp.) such that Z n is balanced seen from I n := A n U B n . Namely, these are the 
vertices within respective distance (L + l)n/2 from a and b (the vertices of H% in 
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both Hn’ 1 and Hn 2 ). Indeed, due to step 2 of the construction, the set £2 n °f Hn ', 
respectively, Hn’ 2 (i.e. the collection of vertices whose distance from a (resp. b ) is 
(L + 3)n/2) is balanced seen from A n , resp. B n . This implies that the distribution 
of Xtz„ is uniform on Z n . Step ( iv ) of the construction of T n is there to guaranty 
that Tz n and Xt z are independent (and hence that Z n is balanced seen from A n 
and B n ). 

It is then not difficult to check (cf. Figure (HI) from the construction that assumptions 
(i) — ( Hi) resp. (i) — (v) of Propositions 16.41 and 16.51 are satisfied. 

Moreover, the hitting time of Z n from a is concentrated around (17 + 3£ 2 )n, while from 
b it satisfies that 


lim F b [T Zn > sn] = < 


1 if s < 15 + 3L 2 , 

1/2 if s£ (15 + 3£ 2 ,17 + 3L 2 ), (6.11) 

0 if s > 17 + 3 L 2 . 

We want to prove that the system displays cutoff in total-variation around time (17+3L 2 )n, 
and that the asymptotic behavior for the separation distance is given by 


1 if s < 32 + 6L 2 , 

1/2 if (32 + 6£ 2 ,34 + 6L 2 ), 

0 if s > 34 + 6L 2 . 

( 6 . 12 ) 

The only thing we have to do to prove these statements is to verify condition (iv) in 
Proposition 16.41 and condition (vi) of Proposition [63] (resp.). The only delicate point is to 
show that for starting points outside of I n the walk mixes rapidly. I.e. that there exists 
an absolute constant C > 0, which does not depend on L, such that 

lim max (|"Cn~|) = 0. (6.13) 

n^oo v(jil n 

Before proving f)6.1311 let us explain how we use it to verify the remaining conditions. 
Note that if L is chosen to be sufficiently large (i.e. such that (17 + 3 L 2 ) > C ) then (I6.13|) 
implies condition (iv) of Proposition 16.41 


lim dggl(sn) = lim P 6 [T Zn > (s — 17 + 3 L 2 )n\ = < 

i—>-00 ^ n—>■ 00 L J 


For condition (vi) of Proposition 16.51 for the case x € y I ni we use Lemma [631 
and the total-variation cutoff result to show that for t > (18 + 3 L 2 + C) 

P^(x,y)/7r n (y) > 1-2 (d^ (18 + 3 L 2 ) + d™(Cn)) , ( 6 . 14 ) 

which is uniformly close to one. 

This yields the right condition provided 32 + 6 L 2 > 18 + 3 L 2 + C (which can obviously 
be fulfilled by picking L to be sufficiently large). We now treat the case where both x 
and y lie in A n (whose analysis does not rely on ( 16 . 131 ) 1 . We use Lemma 16.31 with Z = Z' n 
chosen to be the set of vertices within distance (L + 3)/2 n from a (corresponding to C 2 n 
in the copy of Hn’ 2 )- Recall that by construction this set is balanced seen from A n . By 
( 16 . 31 ) we have that 

P n( x ^y)/ 7T n(y) > P 

Moreover, for any e > 0 

Tf? < (6 L 2 + 18 + e)n 


T|; y < t 


(6.15) 


lim max P 

n—>oo x,y£A n 


1 


(6.16) 














28 


JONATHAN HERMON, HUBERT LACOIN, AND YUVAL PERES 


and this suffices to conclude that condition (vi) of Proposition 16.51 indeed holds. 

Now let us prove (|6.13p . We want to use a simple i 2 bound using the Poincare inequality 
(see Lemma |A. II) . The issue is that the spectral gap of our graph is rather small (of order 
L~ 2 ) due to the presence of stretched edges. However starting outside of I n the walk has 
a very small chance to visit the part of the graph where the edges are stretched, before 
the walk is already extremely mixed. Hence our idea is to apply the £2 bound for the walk 
on a smaller graph which corresponds to the vertices which are likely to be visited. This 
graph will have no stretched edges and a spectral gap which is bounded away from zero 
and does not depend on L. 

We let G\ = (V n ,E n ) denote the graph which is obtained from G * when all the vertices 
within distance Ln/2+1 from a and b have been deleted, together with all edges connected 
to them. First we observe that the Cheeger constant associated to G\ is large (i.e. it is 
bounded from below by some positive absolute constant, which is independent also of L), 
see e.g. Lemma 2.1 in |18] for a proof. 

Proposition 6.9. Let k := (min(c/3,1/18)) 2 /2. Then 


ch L azy(G^) > V*2i/C. 


(6.17) 


Consequently, the relaxation-time of the lazy SRW on G ] n , t re \ n \ satisfies 


r-(n) _l 
£rel _ ^ 


(6.18) 


If we let P(. and 7r n refer to the distribution at time t and at equilibrium for the walk 


on G\, this implies (by Lemma [A.ID that for x € V*, for all t > nn 1 log 9. 

IIP* - TTnllTV < —- \ , , e~ Kt < ( r 

minj 7 T n (y) 


e~ Kt < max deg v ) |K|9 _n < 6(8/9) 

\veV„ 


(6.19) 


What remains to be proven is that if one considers L/ as a subset of Vf, then for any 
x £ Vf \ I n , the distances ||P£. — 7?n11 tv an d ||P^ — 7r n ||TV are very close. Note that 

||Px — 7 T„||tV < ||P* — P^HtV + ||P* — 7fn||TV + \\^n ~ 7?n||TV, (6.20) 

The term ||7T n — tt n 11 TV is exponentially small in n because only an exponentially small 
fraction of the vertices of G\ lie outside of G*. Now if one lets T/pi denote the hitting 
time of 

d&J := {x e V n l \ V* : 3y e V/ 1 , x ~ y }, 

(recall that V/ 1 is the vertex set of G*) we have (by a standard coupling argument) that 

t 

\K - P*IItv < P x [T d yi <t}<J2K (dVn) ■ (6.21) 


Now if x € \ I n , it lies at distance of at least n/2 from dV^ and has to overcome 

a drift to reach it. For this reason it should take time which is exponentially large in 
n. More rigorously, we let Q x be the set of vertices y € such that there exists a 
graph automorphism of G\ preserving a and b which maps x to y (in most cases it is 
just a pedantic manner to describe the set of points at a fixed distance from a, but we 
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have to introduce this definition due to the lack of symmetry of the 6 -side). Note that 
| | /1 dV^ | > 2 n / 2 if x £ I n . Hence we have for all i > 0 and x ^ I n that 


a t}i\ \p V n) n n (dv£) 

oV n =- ' -- < - 777 —r- < max deg(u) . 

J TT n (Q x ) TT n (Ll x ) vGV^ |H X | 

where in the first inequality we have used the stationarity of 7r n , 

E (^) = K n (dV r l). 


< 


2 n ! 2 ' 


( 6 . 22 ) 


Plugging this in (16.211) we obtain (16.131) or more precisely: 

Corollary 6.10. Set t n := \nnT l log 9]. Then 

lim max HP * 71 — 7 r n ||TV = 0. (6.23) 

n->oo xevi\i„ 


6.4. A sequence of Lazy SRW on bounded degree expanders with separation 
cutoff and no total-variation cutoff. The following is a modification of Example [3] 
into a sequence of lazy SRWs on a sequence of bounded degree graphs. 

Example 5. Take a copy of Hn with root a and a copy of with root b. We glue them 
together as follows: we give labels in [2 2n ] to the vertices in £ 2 n in the two graphs and 
merge the vertices which share the same labels. We denote the set of merged vertices by 
Z' n (this is the set of vertices of distance (L + 3)n/2 from a and b). Let G\ denote the 
obtained graph. 





Figure 9. Schematic representation of Example [5] Here the asymmetry of 71 is used 
to avoid cutoff in total-variation, in a similar manner to what is done in Example [3] 

However, as in Example [3] the separation mixing-time is determined by the behavior of 
Tgf in the large deviation regime. Note that Z' is a set of small equilibrium measure (it 
has 4" vertices whereas the full graph has order 8 n vertices). 

The reader can easily check that here a and 6 play symmetric roles. We let A n and B n 
denote the vertices within distance (L + l)n/2 from a and 6 , respectively. Moreover, 

• The center of mass Z n is given by the 2 3n ~ 1 vertices which are linked by the 
expander (which are the vertices belonging to C^n of Hn 1 )- 

• Z n is balanced seen from A n U B n . 

• a and 6 maximize (stochastically) the hitting time of Z n . 
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It is then not difficult to check (see FiglHD from the construction that assumptions (*) —(m) 
Proposition 16.41 are satisfied. Assumption (iv) can be showed to be satisfied as in the 
previous example by using an £2 bound for the graph in which points within distance 
Ln/2 of a and b have been deleted. 

The asymptotic behavior of the hitting time of Z n from a (or b ) is once again given by 
(16.1111 and hence the system does not display cutoff in total-variation. 

For cutoff in separation, we cannot use Proposition 16.51 We use instead Lemma 16.31 
and the relevant set to hit is Z' n . This set is balanced seen from I n := A n U B n and 
thus is the relevant one for the purpose of computing the separation mixing time. An 
analog of the analysis performed for Example [3j does the job. To control the quantity 
P^(x,y)/TT n (y) when one of x and y (or both) does not belong to A n U B n we use an i 2 
estimate (in conjunction with Lemma 16.811 for the subgraph G\ obtained by deleting the 
stretched edges in G \, similarly to what we have done in the analysis of Example [4j 

6.5. Proof of Remark 11.41 Part (i) follows from the analysis of Example [4l We shall 
prove now that part (ii) is satisfied by Example [U 

We denote by 7 xz' the distribution of 7r n conditioned on Z' (suppressing the dependence 
on n). By (|6.4jl we have that for all t and every x € A n and y € B n 

P*(x,y)/n n (y) = = *]P i~ z k ,{Z')lir n {Z') > P \T X ^ = t}/n n (Z'). (6.24) 

k<t 

We know from the previous analysis of Example [5] that for the separation distance to 
equilibrium only (x, y) € A n x B n matter, or more precisely 

lim sup \d^l(t) — max(0,1 — min P^(x,y)/ir n (y))\ = 0. (6.25) 

n ->°° t >0 ( x,y)GA n xB n 

Hence setting 

,y) ■= min{t : P^(x,y)/n n (y) > 1 - rj} 

we prove that cutoff window is constant by proving that, for all e > 0, there exist some 
n £ € N and some absolute constant C 2 such that for all n> n £ and all (x, y) € A n x B n 

t £ (x,y) ~ ti- e (x,y) < C 2 | loger|. (6.26) 

Vt > t £ (x, y), Pn(x,y)/n n (y) > 1 - s. (6.27) 

In what follows for simplicity we drop the dependence in n in the notation t v (x,y). Al¬ 
though this is not used in the analysis below (and hence not proven), we can identify 
ti /4 (x,y) for all (x,y) € A n x B n as follows: 

max{|f 1/4 (x, y) - t'(x,y) |, \t 1/4 {x,y) - t(x,y) |} < C 3 , 

where f(x,y ) := inf{t : P [T^’, y <t]> 7 r n (Z')} and t(x,y ) := inf{t : P [T^.’, y = t] > n n (Z')}. 
This follows from the analysis below, together with (|6.24l) and the exponential decay of 
P t 7Tz ,{Z')) — 7 T n {Z') as a function of t. 

We start by presenting some general machinery which we shall utilize in the proof of 
Remark Ol Let y be a distribution over Z. We say that y is Unimodal if for any 
z\ < Z 2 < z * and for any z\ > Z 2 > z * we have that y(z\) < y(z 2 ), where z* is the mode of 
y (i.e. max, y(z ) = y{z*)). We say that y is Log-Concave if y 2 (z) > y(z — 1 )y(z + 1) for 
all z € Z (equivalently, for all z\ < Z 2 (zi,Z 2 € Z) we have that ’ w ^ iere 

0/0 is interpreted as 0). 
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Fact 6.11. Let /i be a log-concave distribution over Z. Then p. is unimodal. 

Fact 6.12. The family of Geometric distributions is log-concave. 

Fact 6.13. The family of log-concave distributions over Z is closed under convolutions. 

The following representation of hitting times in birth and death chains is due to Karlin 
and McGregor m Equation (45)]. It was later rediscovered by Keilson [13] , The discrete 
time case of this result was given by Fill mi Theorem 1.2]. 

Theorem 6.14. Let ([n], P, it) be a lazy birth and death chain (where [n] := {1, 2 ,n}). 
Let P' be defined by P'(i , •) = P(i, •) if i € [n — 1] and P'(n,n ) = 1. Denote the non-zero 
eigenvalues of I — P' by 0 < j3\ < ■ ■ ■ < (3 n -i < 1. Let G,..., £ n -i be independent random 
variables such that ~ Geom(/3j) for all i € [n — 1]. Then the distribution of T n under 
Pi is the same as the distribution of X^ig[n-i] 

We are now ready to prove (|6.26l) and (16.271) . For clarity of exposition, we first expose 
our analysis for the special case x = a, y = b. Consider the sequence of graphs G\ from 
Example [5j Let G ^ the subgraph of G^ whose set of vertices is given by 


Vn '■= i v '■ dist(u, {a, b }) < (L + 3)n/2}, 

and whose edges are those of for which both ends are in G ^ (Note that this graph is 


connected and includes Z' but not any point further away from {a, b }) 

Let (lf)tgz + be lazy SRW on G\. Consider the projection Y t := l + dist(yj, {a, 6}). Our 


construction implies that the projection is Markovian and thus ( Yt)t^z + is a lazy birth and 


death chain on [1 + (L + 3)n/2], Consequently, by Theorem 16.141 and Facts I6J21I6.131 the 


law of Tg) b , which is a sum of independent hitting time and thus of geometric variables, 


is log-concave. For any v G V% the distribution of Tz>, given that lo = v, is the same as 
that of Ti_|_(L_|_ 3) n /2 (f° r the chain (Lj)), given that Yq = 1 + dist(u, {a, b}). Consequently, 


by Theorem 16.141 and Facts 16.12116.131 the law of T^', b is log-concave. Let z* be the mode 


of T^’, b . A standard computation is sufficient to show that 



(in fact, the first inequality follows from unimodality). 

Fix some <5 > 0 sufficiently small such that P [T^’, b < z* — 5n\ 2~ n (2 -n is the order 


of magnitude of n n (Z r )). By a large-deviation estimate and log-concavity there is some 
a > 1 such that for all sufficiently large n we have that 



hence, again by log-concavity, 



(6.29) 


Consequently, by (16.241) 



Vt < z* — 5n 


(6.30) 
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As Tf ) 6 is log-concave and hence by Fact 16.111 also unimodal, (|6.24j) also yields that 


Vi € [z* — 5n, z*), 


b ) P [T*’, y = k + 1] F " z ' 


vr n(b) 


k<t 


7T n (Z- 


(Z') pt+i( a ,&) 

< n L A (6.31) 


7T n(b) 


and that there exist some absolute constants c, C§ > 0, (3 G (1, 2) such that 


Vi € [z*, z* + n 2//3 ), 


P*(q,6) ^ P[r^ = ^ + rn 2 / 3 ]] 

n n (b) ~ it n (Z') 


(6.32) 


Vi > z* + n 2 / 3 , 1 - < P[T“; b > i] < Ce/n 1 / 3 . (6.33) 

7I "n(a) 

This concludes the proof of the case (x, y) = (a, b) as (16.301) implies (16.261) with 0 2 '■= 
(logo) -1 and (16.271) can be deduced from the four other equations. For general (x,y) G 
A n x B n we decompose Tf;^ into a convolution of a log-concave distribution and some 
other negligible term. Let {Xf)t and (Xf)t be independent realizations of the random 
walk, started from respective initial vertex x and y, defined on the same probability space. 
Let Tf, := inf{f : X? G Z'} and Tf, := inf{i : X y G Z'}. We define T' x (and T' y in an 
analogous manner, using (Xf) and T|,) as follows (with the convention sup0 = 0) 

T' x := supjf : t < Tf,, dist(Xf_ 1; Z') = dist(x, Z') + 1} 


Note that T' x , Tf, — T' x , T' y and Tf, — T' y are independent. We denote T\ = Ti(x,y) : = 
(Tf, — T' x ) + (T|, — Ty) and T 2 = T 2 (x, y) := T' x + T y . By Theorem l6.14l and Facts 16.12116.131 
the laws of Tf, — T' x and Tf, — T y are log-concave (by a similar argument to the one used 
before using a projection to a birth and death chain), and so T\ is also log-concave (by 
Fact 16.131) . Observe that T\ + T 2 has the same law as Tf; y . 

Denote the mode of T± by z* = z*(x,y). Fix some 5 > 0 sufficiently small such that 
m.m.i xy \ e ^ nXBn ¥\T\{x,y) < z*(x,y) — 5n\ 2~ n . Imitating the proof of (16.30[) . using a 

large-deviation estimate on }~(x*y)-[s n ] } which is uniform in (x, y) (the existence 

of such a uniform large-deviation estimate follows from the analysis of Example [3], or 
alternatively, by 0 Lemma 6.2]), together with log-concavity, we get that if a > 1 is 
chosen sufficiently small, then (16.291) remains valid simultaneously for all choices of x, y, 
if one replaces Tf) by Ti(x,y) (and z* with z*(x,y)). We argue that (|6.28l) - (I6.33|) can be 
extended (excluding the middle terms) to all (x,y) G A n x B n (in the role of (a, b)), with 
the same choice of constants for all (x,y) G A n x B n . To extend (16.301) and (16.311) . note 
that after conditioning on T 2 we can imitate the above proofs and so the extensions are 
obtained by averaging over T 2 . For (|6.32l) . note that by unimodality 

P[Tf) y = z*(x,y)+\n 2/3 ]\/ir n (Z') > ci2 n P[T 2 (x, y) < \n 2/3 ]]P[T 1 (x,y) = z*(x,y)+|"n 2 / 3 l]. 


It is not hard to show that there exists some 7 < 2 and C 2 ,Cq > 0 such that 

P[Ti(x, y) = z*(x, y) + |"n 2/3 ]] > c 2 7 _n and P[T 2 (x,y) < |~n 2 / 3 ]] > 1 — C 6 n” 2 / 3 . 


for all (x, y) G A n xB n (by Markov inequality and the fact that max.r x y \ & A nXBn E[T 2 (x, y)\ = 
0(1)). For (|6.28[) use unimodality (first inequality) to show that for all (x,y) G A n x B n 

|z*(x,y) -E[Ti(x,y)]| < C' 4 v / Var(T 1 (x, y)) < O 4 y / Var(Tf?) < C 5 y/n. 

Lastly, for (|6.33l) use (|6.28K and Chebyshev’s inequality (by noting that |z*(x,y) — 
E[Tf; y ]| < |z*(x,y) — E[Ti(x,y)]| + E[T 2 (x, y)] < C^^fn). We leave the details to the 
reader. □ 
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Appendix A. Proof of Technical results 

A.l. Basic ingredients. In our analysis we use various times raw 1 2 bounds in order 
to get estimates on total-variation distance. We cite here a standard result (see e.g. m 
Lemma 12.16]). 

Lemma A.l. Let (LI,P,tt) be a finite lazy irreducible reversible Markov chain. Let /i be 
a distribution on Li and let A 2 be the second largest eigenvalue of P. Then 

2||P p - Ti"11 tv < ||P^ - 11 2 ,7 t < A |||h - vt 11 2 ,7t 5 for all t> 0. (A.l) 

Lemma A.2 (Hitting time from stationary tail estimates). Let (Li : P, n) be a finite irre¬ 
ducible reversible Markov chain. Let A (Z Li. Then for any t > 0 we have that 

p tt[T a > t] < (1 - 7 t(A)) exp (-tTr(A)/t re i). (A.2) 

For a proof see |5j Lemma 3.5] (or Proposition 3.21 in conjunction with Theorem 3.33 
and Corollary 3.34 in [2]). 


A.2. Proof of (II. 8D . We can assume that the chain displays pre-cutoff in separation as if 

M)i 


not, there is nothing to prove. We know from (11.51) that for every e > 0, tf^ x (e) < tiep(e). 
Hence, in our case, it is sufficient to prove that 


lim sup lim sup ■ 

£ ^ 0 rw oo *W( 1 - 2 Vi) 


tscp (1 - e) 

W 


< 2 , 


(A.3) 


as pre-cutoff implies that —^—L tends to 1 when n goes to infinity and e goes to 0 in 

Lep (1 

this order. 

We shall show that for all n 

4:1(1 -£)< 2^(1 - 2Vi) + 2 4:1 log e" 1 . (A.4) 

This is sufficient to conclude as Fact ascertains that the second term is small in 
comparison to the first. 

To prove (IA.4I) , let us introduce the following alternative way of measuring the distance 
to equilibrium, 

d(t) := max \\P x (X t G ■) — P y (X t G •) 11 TV - (A.5) 

From m Lemma 19.3] (or (16.91) 1 we know that for every t we have 

d tsep (t)<l-(l-d(t/2)) 2 . (A.6) 

Hence if one defines t m ix(^) to be the first time at which d(t) < e, we have for every e > 0. 

t sep (l - e) < 2t mix (l - Vi)- (A.7) 

Now to conclude the proof of (IA.3[) we need to show that (under reversibility) 

£mix(l - Vi) < tmix(l - 2 Vi) + trelloge -1 . (A.8) 

Let us set 

t = imix(l — 2Vi) and s = t re \ logs -1 . (A.9) 

For x € Li, we set fi x := P x (X t G •) and 


:= min( f (z) 'li z) ) and M f( 2 ) := L! (2 > - 


1 - \\n x - 7t||tv 


||^ X - 7111TV 


(A.10) 
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Both are probability measures and fi x can be written as a linear combination of the two 

M* = (1 - \\y x - vr|| T vK + IIm* ~ ttHtv/4 

For /r, a distribution on Q, f,g£ and 1 < p < oo, we introduce the notation 

nP s (x) := Y v(y)P S (y, x), P s f(x ) = Y pS ( x 1 y)/(2/)» IIsIIp = £ tt(z/)|s(j/)| p ) 1/p - 

y£fl y£fl y 

Assuming that x and y maximize the l.h.s. in (IA.5I) at time t + s and that 

\\ fi x — tjIItv > WiP - vj||tv 
then by the triangular inequality we have that 

d(t + s) = \\n x P s — h v P s \\tv 

< (1 - \\n x ~ vtIIty) || rfP s - ^P s \\tv + \\» x - MItvMP* ~ ~&P S ||tv, (A.11) 

where is an adequate linear combination of fi\ and /j?) ■ Using the dehnition of t we 
have H/i 1 — 7t||tv < 1 — y/e, and hence we have 

d(t + s) < 1 - 2yTe + || y\P s - Mi-P s 11 tv, (A.12) 

LL X (z) U V (z) 

Now to estimate the second term we set f(z) = ~^zy — n(z)- Observe that by reversibility 

y\P s _ ijVps _ P sf Note that by definition \f(z)\ < (1 — \\fi x — vt||tv) _1 < 2e~ 1//2 . Let 
1 = Ai > A 2 > ... > A|q| be the eigenvalues of P and set A := max(A 2 , |A|q||). Using the 
spectral-decomposition of / along with the fact that ^2 z ^(z)f(z) = 0 (and finally, the 
choice of s) it is standard to show that ||P S /||2 < A s ||/H 2 < A s (2e -1 / 2 ) < 2yfe. Hence 

Writ* -mP’Itv = jir/lli < IjP’fh < Vi, 

as desired. □ 


A.3. Proof of Lemma l6.3l By decomposing over the possible values of Tz, using the as¬ 
sumption that Z is balanced seen from x and reversibility (which implies that P® (y) /zr (y) = 
~Py(Z)/iv(Z), for all s), we get that 


P\x,y) 

7r(y) 




ki<t 


7T (y) 


pt—fci/ \ 

> y, p *p> = = = *■] 

ki<t fci<t 


n{y) 

P 2 " fcl (Z) 

tt(Z) 


Y Px[T z = k2]P y [Tz = k\ — k-2 


, n -*( z)_ EnTr = k ] e ‘ P ( z ) 


0<fci <t 
0<fc 2 <fci 


ir(Z) 


k<t 


n(Z) 


(A.13) 


In particular, we have equality if P x [Tz < T y \ = 1 (i.e. in case (ii)). To conclude, for 
(i) we note that for lazy irreducible reversible chains that P % z (Z) > ir(Z), which can be 
easily verihed using the spectral decomposition and the non-negativity of the eigenvalues 
of P. For (ii) note that by (1 A. II) 

p lz(Z) - <Z) < ||P* Z - ttIItv < ^hz - 7T|| 2 A| < ^V(l -*{Z))MZ). 
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The first inequality in (16.41) is obtained by plugging the last estimate in the second term 
of (16.4p . For the second inequality in (16.41) it follows from the estimate 

E ^ T z V = k ^~ k ^ ( n k>o F ^ T z V = fc ]) E A 2 = maxP[T^ = k}/( 1 - A 2 ). □ 

k<t ' “ ' k >0 

A.4. Proof of Proposition 16.41 The result is mostly a consequence of the following 
result which relates the mixing time starting from x to the hitting time of a set Z balanced 
seen from x. 

Lemma A.3. Let (Ll,P,ir) be a finite lazy irreducible reversible Markov chain. Let Zed 
(we denote its complement by Z c ) and x € fi. Given 0 < e < 1. Set 

t x ,z{p ) := min{t : P X [T Z > t] < p }, 

s £ := [freilog \tt(Z c )/e\ ji\ t(Z)] and r £ := [~f re ilog [n(Z c ) / (-k(Z)£ 2 )\ /2~| . 

Let s' := m&x(t Xt z(p) — s £ ,0). Then we have 

||P* ~ vt||tv > V ~ e - (A.14) 

Moreover if Z is balanced seen from x then we also have that 

\\Pl x ’ z{p)+re -7 t\\tv<p + s. (A.15) 

Proof. The first result is proved by coupling the chain with initial distribution P^ -Se with 
the stationary chain (k s £ to be determine soon). We have 

P X [T Z >k}^ ||P*- S£ - 7t||tv + P ATz > s £ } ^ \\P k x ~ s * - vr|| TV + e. (A. 16) 

where the last inequality is a consequence of (1A.2D and the choice of s £ . Setting k = t, x ,z(p) 
we obtain the result (as if s' = 0 there is nothing to prove). 

We now prove (1A.15D . By the assumption that Z is balanced seen from x, for all £ <t 

P t x = -p x [T z >£\P x [X t e- | T z > l\ +Y J P x[Tz = £ ~ (A.17) 

o <i<e 

By the triangle inequality and the fact that the distance to n decreases in time, we obtain 

II P X - ^Htv < P x[Tz>£]+ E ^x[Tz = i-i} l|P^ +l - ^IItv 

0 <i<i 

<P*[Tz><] + ||P^-7r||TV 

Using this inequality for i := t Xt z(p ) (and so t — £ = r e ) we only have to show that 
||P*^ — 7 t||tv = ||P r fi z — vr11 tv < £■ Combining (1A.1I) with the definition of r £ , we have 
that 

||p; e z - ttIItv < A r fiV*(Z c )/ir(Z) < e. (A.18) 

□ 

We can now proceed to the proof of Proposition 16.41 With our assumptions on i re i 
and Z n , Lemma I A. 3 1 allows us to show that mixing time starting from x and t x z n (p) are 
equivalent when Z n is balanced seen from x (i.e. for x € I n ). Assumption (iv) ensures that 
what occurs for other initial conditions does not matter and Assumption (Hi) establishes 
that a is the worst initial condition. 

□ 
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A.5. Proof of Proposition IBT51 From Lemma [6.31 and assumptions (i), (to) and (u) we 
know that P/(x, y)/7r n (y) and P [T^’ y < t] differ only by a negligible amount, provided 
that x € A n and y € B n . Assumption ( iv) ensures then that 


lim inf inf min 

n-xx> t>0 (x,y)£A n xB n 


p n( x >y) 

7Tn(y) 


P n { a nib n ) 

K n {K) 


(A.19) 


We are left checking the other cases. Assumption ( vi ) takes care of most of them, and 
leaves the case where ( x,y ) € B n x B n . for which Lemma 16.31 implies that P[T^ < t] is a 
lower bound for P^(x, y)/TT n (y). Hence the conclusion follows by assumption ( v ) again. □ 


A.6. A short alternative proof of Theorem [O We are going to show that there exists 
an absolute constant c > 0 such that for any lazy chain 

*mix(l/4) - tmix(3/4) > cVimix(3/4). (A.20) 

Indeed set t := t m ix(l/4) and s := L C V^J- A sample of the distribution of the lazy chain 
at time t can be generated by running the non-lazy version of the chain for steps, where 
~ Bin(f, 1/2) and is independent of the non-lazy version of the chain. By the triangle 
inequality we have (first inequality) and a standard coupling argument (second inequality) 

Vi, s > 0, d(t) - d(t + s) < max ||P* - P* +s ||tv < ||6 - &+s||tv- 

sen 

Moreover, if c is chosen well, we have for every t > 0 that ||£$ — ^>t+\cy/t\ 11 TV — f/ 2 - □ 


References 

[1] M. Ajtai Recursive construction for 3-regular expanders. Combinatorica 14-4 (1994): 379-416. 

[2] D. Aldous, J. Fill. Reversible Markov chains and random walks on graphs. In preparation, 
http://www.stat.berkeley.edu/" aldous/RWG/book.html. 

[3] N. Alon, V. Milman. Ai isoperimetric inequalities for graphs, and superconcentrators. Journal of 
Combinatorial Theory, Series B 38.1 (1985): 7 3-88. 

[4] N. Alon. Ai Eigenvalues and expanders. Combinatorica 6.2 (1986): 83-96. 

[5] R. Basu, J. Hermon, Y. Peres. Characterization of cutoff for reversible Markov chains, preprint: 
http://arxiv.org/abs/1409.3250 

[6] G. Y. Chen, L. Saloff-Coste. Comparison of cutoffs between lazy walks and Markovian semigroups. 
Journal of Applied Probability 50-4 (2013): 943-959. 

[7] G. Y. Chen, L. Saloff-Coste. The cutoff phenomenon for ergodic Markov processes. Electronic Journal 
of Probability 13.3 (2008): 26-78. 

[8] A. Dembo, O. Zeitouni. Large Deviation Techniques and Application, 2nd Edition. Stochastic Mod¬ 
elling and Applied Probability 38, Springer. 

[9] P. Diaconis and Laurent Saloff-Coste. Separation cutoff for birth and death chains, The Annals of 
Applied Probability 16-4 (2006):2098-2122. 

[10] J. Ding, E. Lubetzky, Y. Peres. Total variation cutoff in birth-and-death chains. Probability theory 
and related fields 146.1-2 (2010): 61-85. 

[11] J. A. Fill. The passage time distribution for a birth-and-death chain: Strong stationary duality gives 
a first stochastic proof. Journal of Theoretical Probability 22.3 (2009): 543-557. 

[12] J. Hermon, A technical report on hitting times, mixing and cutoff arXiv:1501.01869 [math.PR], 

[131 S. Karlin and J. McGregor. Coincidence properties of birth and death processes. Pacific J. Math. 9 
(1959), 1109-1140. 

[14] J. Keilson. Markov chain models, rarity and exponentiality. Applied Mathematical Sciences, vol. 28, 
Springer- Verlag, New York, 1979. 

[15] H. Lacoin, Mixing time and cutoff for the adjacent transposition shuffle and the simple exclusion , 
(preprint) arXiv: 1309.3873 

[16] D. Levin, Y. Peres, E. Wilmer, Markov Chains and Mixing Times. American Mathematical Society, 
Providence, RI, (2009). 







SEPARATION AND TOTAL VARIATION CUTOFFS 


37 


[17] E. Lubetzky, A. Sly. Cutoff phenomena for random walks on random regular graphs. Duke Mathemat¬ 
ical Journal 153.3 (2010): 475-510. 

[18] E. Lubetzky, A. Sly. Explicit expanders with cutoff phenomena. Electronic Journal Probability 16.15 
(2011): 419-435. 

[19] E. Lubetzky, A. Sly. Cutoff for the Ising model on the lattice, Inventiones Mathematicae 191 (2013), 
719-755. 

[20] O. Reingold, S. Vadhan, A. Wigderson. Entropy waves, the zig-zag graph product, and new constant- 
degree expanders. Annals of mathematics (2002): 157-187. 

[21] A. Sinclair, M. Jerrum. Approximate counting, uniform generation and rapidly mixing Markov chains. 
Information and Computation 82.1 (1989): 93-133. 


